 I'm Robert Edmonds. This is Paul Vixie. If you came to see him, you might be disappointed because he's got a little ten minute thing at the end. The talk we're covering today involves passive DNS. Hopefully most of the attendees here are familiar with the DNS in general. Can I see a show of hands? People who know what the DNS is. Very good. How many have read one or more DNS RFCs? Excellent. How many of you have heard of passive DNS? I'll probably skip over the introductory material as fast as possible because I have way too much material. I'd like to get to the demos, but in order to understand why the demos are so cool, you do need a certain amount of depth. Let's get started. This is the structure of the talk. There's four parts. I'll skip that. The DNS maps host names to IP addresses. This is generally what we consider it to be. In fact, it's more of a generalized system. It's basically a distributed key value system. More generally, it maps key type tuples to a set of unordered values. Basically, a multi-value distributed key value store. Excellent. Some terminology. We have clients, caches, and content. There are three fairly well-defined roles in the DNS. Clients request full resolution service from caches. Caches make zero more inquiries to content servers on behalf of clients, and the results are cached. We have content name servers which serve the DNS records that have been delegated to them. Here's a PowerPoint diagram showing the eyeglass nature of this delineation. You can see that an ISP might have millions of clients. These clients are everywhere, phones, laptops, whatnot. There are millions of content name servers because there are millions of domain names that people want to look up. You have very few DNS caches. Even the largest ISPs have around 100 DNS caches or less. We have this natural choke point where we can insert a monitoring application. The DNS, I just explained, we have two different protocols. One that the client speaks to the cache and one that the cache speaks to the content name servers. They're fairly different semantics, but they use the same water protocol. The DNS is commonly considered to be part 53, but it's terrifically complicated. The two protocols are the client server protocol and the inner server protocol, spoken between caches and content name servers. Passive DNS focuses on the latter because we don't want to see client queries or give the impression that we are spying on people. In fact, what we want to do is gather intelligence about the domain name system rather than clients that are requesting that information. Passive DNS replication was invented by Florian Weimer about six years ago who found a variety of uses for the technique. The most impressive use of that to date has been in combating malware and e-crime. Domain names are so cheap as to be basically free, but infrastructure and numbering resources are not, and the sharing of that infrastructure to host multiple e-crime campaigns makes it possible to track and link malicious uses of the DNS. Just to give one example, fast flux botnets light up the passive DNS like a Christmas tree. Passive DNS replication basically consists of a number of passive sensors that listen to the packets that are generated when a DNS cache performs a lookup. The packets are then forwarded to a central collection point for analysis. After the packets captured by a passive DNS sensor are submitted to the collection point, they're then parsed and analyzed and reduced into a stream of individual records that we then permanently store in a database. And here's a picture of our DNS architecture again where our cache is talked to content servers. We've taken the clients out of the picture since they're not particularly relevant to passive DNS and we've inserted a sensor between the caches and content showing how passive DNS monitors and forwards the inner server traffic to a collection point. And that collection point is called SIE or Security Information Exchange. We'll cover that as briefly as we can because the main thrust of this talk is concerns the database that we build based on that data. So there have been a number of different implementations of passive DNS over the years. The original was Florian Weimer's DNS logger. He was back and was initially hosted at Rust cert and then at BFK. Florian's implementation uses a custom libpcap forwarder on the passive DNS sensor and a Berkeley DB database for permanently storing that data. Then we have Bojan's DNS parse set of New Zealand, which uses a fairly primitive TCP dump-based sensor and a MySQL database. And the most recent and to our knowledge the largest implementation is in ISE's Security Information Exchange, which inserts an elaborate distribution layer between the passive DNS sensor network and the analyzers of the captured data. And we use a custom analyzer that is hardened and more advanced than the analyzers using DNS logger or DNS parse. And our storage components have been similarly hardened and have advanced functionality for dealing with the large amount of data that we collect. There may be other implementations, but I don't know of any other publicly known passive DNS implementations. This brings us to ISE SIE, which we'll describe briefly, and only the features that are relevant to passive DNS. So SIE is a distribution network, as I said, for replicating the security data from multiple sources to private secure lands, where the data is broadcast to trusted consumers, including ISE itself, of course. One of those types of data is, of course, passive DNS. Passive DNS sensor operators run sensor code that captures DNS traffic and periodically uploads it in batches to our submission servers. And our submission servers take care of replicating the data among physical SIE sites, where the data is broadcast out in a standardized format on private VLANs, sort of a poor man's multicast. And we use a standardized interchange format called inMessage, which we use to encapsulate the data. It's an extensible binary format that we've optimized for real-time transmission on jumbo frame ethernet. And we use this format to record the raw packets that are captured by our passive DNS sensors. And this allows us to add a good deal of metadata along with the raw messages. So, for instance, instead of storing a flat stream of packets like your typical PCAP file, we embed both the query and its corresponding response into a single message, which we can then process as a unit rather than groveling through an entire PCAP file. And we don't really need to cover data messages anymore, but you can see our Google Tech Talk where we describe just that format at that URL. And this is the end of our introductory material, and we'll move on to some ancient DNS security issues and the analogues that they present in passive DNS. Unfortunately, nothing short of universal DNS-sec deployment really helps us since passive DNS operates on a lower level than DNS-sec. So we have to be able to handle signed and unsigned data in a secure fashion as possible, so we won't be discussing DNS-sec further. So what DNS security issues are most important to passive DNS? We have cache-puref poisoning, which is an old, easily defeated type of DNS cache poisoning from the 90s. And we also have Kamensky poisoning, which is a particular type of DNS cache poisoning that makes use of spoofed response packets. Actually, spoofed responses in general are very harmful to passive DNS data collection. Let's see. Cache-puref poisoning is the name of a type of DNS cache poisoning that occurs when a content name server pens extra records to a response. It's particularly insidious because an attacker can poison any record in the entire DNS tree. Some people are running a content name server and tricking clients into looking up a DNS name under the control of the attacker. So that's very easy to do. Here's how the attack works. The attacker runs a content name server and a client is enticed to look up a domain name under the attacker's control. And the cache contacts the attacker's name server and the attacker's name server provides extra records to the cache. And these extra records are inserted into the cache instead of being discarded. I'm going to skip the actual example because it takes a while to explain, but the idea is that we have a malicious.example.com name and we trick someone to look it up and suddenly this unrelated name www.example.net points anywhere we want. So the blue one and the red one. The red one is malicious. The blue one is suspicious but not illegitimate. Actually, let me go back. The gray record there is the malicious one and it gets scrubbed out by DNS caches. Modern DNS caches. I think we can sort of skip this slide. Let's skip this one. And we can skip this one. Ah, okay. Now that we've covered all the background material has been lead up to the meat of the talk. How is this particularly relevant to passive DNS? So Florian Weimarin has 2005 paper identified several problems with validating the data collected by a passive DNS sensor but he chose not to implement protections against these problems but we would like to fix these problems if possible. So we have the two problems in passive DNS that are sort of analogous to the two problems in this active DNS and the problem with cash for F poisoning in passive DNS is that we can't see the DNS caches internal state so they can't recover what's called the the bailiwick that the cash associated with a particular response. The bailiwick is entirely in the eye of the beholder and this leads to a problem called record injection and the problem with spoofed responses is that they're treated just like normal responses by the passive DNS sensor and a single spoofed response can poison the passive DNS database so this kind of worries us because it makes passive DNS databases sort of an unreliable tool if they can be an attacker explicitly wants to target a passive DNS database. So anyway we'd like to make sure that passive DNS is at least as reliable as active DNS as far as resilience to cash poisoning goes for whatever that's worth in order to avoid allowing passive DNS to itself become a target in addition to the DNS. So let's move on to the protections that we've implemented on top of the basic passive DNS idea. It's actually relatively easy to protect the capture stage from spoofing in order to make our DNS sensor at least as reliable as the DNS cache that it's monitoring we need to capture both the outgoing queries that the cache is generating and the incoming responses that it gets back. So current generations of passive DNS only capture the responses and they don't have the context or the state that is generated by the actual query that went out. So what we need to do is capture the outgoing queries and correlate them with the responses and verify that they fit together and form a valid transaction. In most cases we need to verify that nine different fields between the query and response are identical and those are the nine different fields that we don't have time to discuss but basically we match them together and we can tell exactly which transactions are valid and which ones appear to lack the additional state to verify that they're valid. So ISC has released a tool called DNSQR that performs DNS packet capture and validates the query response pairs and validates the query response pairs that are seen on the wire. DNSQR temporarily keeps each outgoing query packet in memory in a state table and when a response packet is received we do a lookup against that state table and then the state table is keyed on that nine tuple that we just skipped over. We can skip this, it's interesting but we don't have enough time. There's blah, blah, blah. There's IP reassembly which I won't explain why that's cool. So now that we've eliminated this spoofing vulnerability in the capture stage we can move on to the data that is valid or the data that has been captured and has passed all these checks but it happens to be attempting to redirect us like the cache pref style poisoning. Cache is internally associated with Bailiwick with each outgoing query and Bailiwick is sort of equivalent to the zone but not really, it's a good enough approximation for this talk. So cache pref poisoning is pretty much identical between active DNS and password DNS and the solution is we're going to keep track of which zone or Bailiwick that we expect the answer to be contained in and we're going to discard any response records that aren't contained in that zone. And of course nothing in the wiring protocol indicates the Bailiwick that the cache is expecting. So when we're looking at a particular DNS query response pair we don't see anything in the packets that help us determine the Bailiwick either so we have to determine it ourselves from first principles. That's a relatively minor point. So we have to develop an algorithm that operates entirely passively it's not allowed to make any inquiries of its own it has to passively process each message and tell us for each record name and the response whether the IP address that sent us this response is really allowed to assert knowledge for that name. And this algorithm must provide a true or false value it can't be a heuristic we can tolerate a small number of false negatives if we have incomplete knowledge we cannot tolerate any false positives. That is the, so basically that is the passive DNS Bailiwick algorithm analyzes a DNS response and answers the question for each record name is the response IP address a name server for the zone that contains or can contain this name. For example the root name servers can assert knowledge about any domain name and pass the Bailiwick verification algorithm. And the GTLD servers the .com and the .net servers operated by Verisign can insert knowledge about any domain name that ends in .com or .net and in fact they did so for a short period in 2003 it's called Verisign site finder. Very good. So here's how the algorithm works first we allocate a whole bunch of memory that will be used to cache the name server and address records after they've been verified by the algorithm and we call this cache the Bailiwick cache and we initialize this cache using the entire root zone which will give our algorithm knowledge of where the root and TLD name servers are and thus it will know the Bailiwicks of all the root and TLDs. Then when we want to verify a given name against a given name server address we find all the possible zones the name could be located in and whether the name server address and the response packet matches any of the addresses that are name servers for these possible zones and finally each time the algorithm successfully verifies an NSA or quanta record that record is then inserted into the Bailiwick cache we're going to skip through the algorithm examples because they take a really long time to explain and we want to get to the demo okay what we call DNSDB is basically a database for permanently storing DNS records from a variety of sources along with a certain amount of metadata such as timestamps our primary data sources are passive DNS data as well as a number of zone files that we have access to each record is serialized as an array of bytes which we then store in an Apache Cassandra database we don't have enough time to go over the exact serialization scheme unfortunately because it's really neat we chose Apache Cassandra for a number of reasons in particular it's a distributed key value store easily as the size of the database grows and its data model provides a good fit for the type of data that we want to store it's also extremely fast and can easily keep up with the amount of data that we need to process primarily due to the fact that Cassandra always performs sequential writes so it can make use of cheap slow SATA storage we then export the data stored in our DNS database via an HTTP API intended for both queries as well as an interactive web search interface and since we started loading data about a month ago the size of our database has grown to about 500 gigs out of 27 terabytes total storage that we've deployed we just bought some small little 2U boxes to test the thing out DNSDB consists of several loosely coupled components we have a number of data sources from which we receive new DNS data and that's the first and most important source of course is the passive the first and most important source of DNS data comes from a program called InMessageDNSCache which processes the raw passive DNS data as it comes in from our sensors this is after it's passed the query response state inspection and this program is now going to perform the passive belliwack algorithm that we described earlier and as that data comes in from the sensors and the second source of data comes from the GTLDs that make daily zone file dumps available so for instance you can go to AeroSign or Ophelius and get access to daily dumps of their entire zones because of the contract the contracts they have with ICANN that requires them to do so and the third source of data comes from DINAS zones whose administrators allow us to perform zone transfers of their zones and our smallest source of data is mostly just the IC.org zone and we're interested in adding more data via this method it's very cheap and easy to process in this manner within of data loaders that process the passive DINAS data into batches of individual database rows and columns and then connect to our Cassandra cluster in order to insert the process data so we have InMessageDNSCache which parses I'm going to quickly go through what it does and then we'll get into the demo as soon as possible it's a standalone program that reads the raw passive DINAS data and parses each DINAS message into a stream of individual resource record sets and then we apply a series of filters before inserting each RR set into an in-memory cache and these filters eliminate roughly 50% of the data stored on the database and RR sets that pass all the filters are then inserted into an in-memory cache which we've tuned to store about 8-12 hours worth of data and finally the cache is expired in a strict first-in, first-out order and these expired RR sets are sent over a socket so the output can be stored for later processing and insertion into the DINAS database I think I covered this so we sign up for the zone file programs and we then so we get the zone file dumps but these are just the NS delegations so in the .com zone we see Google.com as these names errors but that zone doesn't include all the records that Google.com wants to serve obviously since the DINAS is hierarchical and distributed and then we can use passive DNS to fill in those gaps in the lower child levels of those zones and this provides us with all the TLD data provides us with about 8 gigabytes of DINAS data every day and the passive DNS is probably a whole lot more than that we also operate a DINAS server that slaves a few zones like IC.org and when those zones get updated we apply AX for or IX for a zone transfer and when those zones are updated we get archived and processed and inserted into the database this is a crazy well it's scaled poorly but that's a crazy diagram of our architecture and this will not be on the quiz we apologize for that slide now we can start the demos how much time do we have left what time is it I'm going to switch configurations real fast this is the web interface this is in pre-beta mode ok let's see what we can find our domain name search allows us to do wildcard matches which I don't think any of the publicly available searchable passive DNS databases allow you to do a wildcard lookup so this is a kind of a neat new feature that we've implemented we're going to look at star.google.com we're going to only look at the records that are present in the top level com zone so this means essentially name server delegation records and address records that are necessary for reaching the google.com name servers so if we see anything out of the place in this particular domain in this particular TLD we know something is up so here are the four matches that we get we have up to four time stamps associated with each match this one has four and we have a pair of first seen and last seen time stamps for data that was seen by passive DNS and a separate set of time stamps for data that was seen in a zone file so this indicates that this first match right here indicates that the google.com NS delegations were seen with these records from both the zone file and in passive DNS and these down here say in zone file so that means that without the other two time stamps this indicates that these records were only seen in the dot com zone and if they weren't seen in passive DNS that means that we might have a mismatch between the com servers and the google.com name servers so we see these three records we have b and f.l.google.com which are kind of interesting because these exist in the com zone but not in the google.com zone so let's click on this and we get matches for two domains 20comments.com and antifaVLC.com and supposedly google is the name server for these names let's click on this one and we get a NS delegation for this domain and supposedly according to the dot com zone file this domain is hosted by google gtld servers and hroot for all the people that know about the DNS this is an incredibly unlikely event to occur legitimately so it looks like someone has invented a rather creative method of parking a name and we can do some more clicking around but I'm going to skip the rest of this example so we can get into some more interesting things but basically there's a bunch of weird people out there that claim the root servers and the com servers themselves are hosting their zones which they don't they only host the delegations to their zones okay I am not a security researcher but some of my best friends are a good friend of mine recommended that this particular audience would be interested in all of the names that are hosted by soft layer apparently it's some sort of a colo hosting provider or something that for some reason people really dislike so let's see what happens when we want to look up all the names that say soft layers DNS servers serve so we're going to put in we're going to do what's called an inverse query or an inverse search the DNS originally has something called iQuery that worked very poorly and apparently you have to build giant databases to support this so we're going to put in a name ns1.softlayer.com and we want to know which domains they host and here are 1,000 records this is probably tripped over the maximum limit that the web interface has so they must have a lot more than that so we see all these different domains hosted by them I was going to show you another software example I basically need a slash 16 that belongs to someone of interest unfortunately I neglected to look up that information so I'm going to use should I use ISC as an example okay let's look up an IP or network match I'm going to put in ISC's slash 16 and perform a search now this is really cool because it did it really fast at each slash 16 for breakfast so here are a whole bunch of names that go into our ISC's name space there's a whole bunch of crap here someone is hosting a lot of domains for some reason I can't imagine why that would occur but here's basically all of the domains that are pointed into our address space and this is interesting because NAS is a forward delegation treat so you don't really know who's pointing into your address space so this is a very excellent monitoring application to keep an eye on who's claiming to host something in your address space I think that's about it for this particular set of demos so I think I'll switch over to Paul Vixie and I think we can turn off the projector now so I don't think Paul is going to use slides thank you Robert it's true I'm not going to use slides I had some but they were terrible so I decided that rather than being distracted by them and not using them for anything I would just talk so this passive DNS stuff is very cool I want to thank Florian again but he's been here he's come to DEF CON before some of you know him for coming up with this idea I want to admit once again when he first told me about it I thought it was a terrible terrible idea until I saw the ways that it could be used by e-crime fighters whether law enforcement or freelance to go find out who's doing what to whom DNS is fairly often used for bad stuff in fact it is personally used for everything that happens on the internet and so anything bad is also using DNS it bothers me that people that are trying to hurt me get to use a protocol that I helped you know I had a hand in maybe some software and they're using a global system that I'm helping to keep running in order to execute e-crime so last time something like this bothered me was in the mid 90s when I created something called maps which some of you may recognize as spam spelled backward was the mail of abuse prevention system so it's really unlikely that any of you has ever before now been in a room with somebody who's been sued more than I have and that's a result of me publishing reputation data about SMTP servers other people subscribing to that reputation data which we called an RBL and they would subscribe to that either by BGP or with DNS and tell their send mail or post fix or whatever see if it appears on Vixy's list then you can just bounce all mail that comes from there and the problem is that the people that were on my list didn't want their mail to be bounced and they found that spam that was coming from their networks was hard whereas suing me was easy anyway my wife has assured me that this won't happen again so as a result of sort of all that 10-12 years later we sold maps and are still selling the old RBL there are probably 100 other RBL like things I guess the new name for them the DNSBL but it will always be an RBL to me because that's what we called it and we're all getting as much spam as we ever got so I've noted that email is less reliable than it used to be a lot of false positives a lot of people have abandoned RBLs in place and put a wild card in there so that their subscribers bounce everything until they stop subscribing I will tell you that the really, really RBL the very first one is called rbl.maps.vix.com we only used that name for about 6 months and then we got mailabuse.org or whatever and started using that that's in vix.com and people still send queries to it 12 years later, 12 years after it was turned off it's rather frustrating I suppose I could wild card it and then they would all bounce all their mail instead I collect the data and I give it to the passive DNS effort anyway noting again that e-criminals need DNS as much as the rest of us do and not only do they need DNS they need it to work as well for bad guys as for good guys I have decided to try again except this time we're not going to create a reputation feed that is subscribed to by email servers no, no, no reputation feed that is subscribed to by recursive DNS servers the things we're going to black hole this time will not be the IP addresses of other people's mail servers they will be the domain names that the bad guys need for e-crime so if you go to our website www.isc.org you'll see that while Robert was talking I hit the publish button on a blog entry that describes all this we're also headed for a Q&A room after this if you'd like to talk more about it then but just to hit the high points on it we will not be sued ISC is a non-profit 501c3 we do not like being sued so we're not going to publish any content I know where a lot of bad domains are thanks to Robert but I am not going to make a list of them and publish them because that's the way you get sued instead we are going to publish patches in fact we have just published patches for bind 9 bind 9 recursive name server you can apply what is about a 600 line unit diff and get a new feature which is to subscribe to a response policy zone yes we are going to use DNS zones as a way to propagate sort of policy and reputation information about other DNS names it's a lot like the RBL in that sense it doesn't slow the name server down it has not caused any crashes or core dumps in the last couple of weeks anyway it will be no certain advisories yet it's not my code so let me think what else the spec is open no patents, no license, no nothing anybody else, any other recursive name server operator and nominum I'm talking about you who wants to implement this can do so I'm not going to pay any tithe we have talked to a bunch of content providers there's something out there called an RHSBL I guess first popularized by Jeff Chan of Serbil but it is like an RBL accepted refers to the stuff after an outside rather than before the outside or whatever I'm not sure what the right hand refers to but anyway it's lists of bad domain names spam house has one, Serbil has one bunch of other companies have them they didn't have a distribution channel for it now they do they're going to be able to use this because we're creating a single universal format that any recursive DNS server could implement and any reputation provider can publish with in order to have it be that if somebody is doing something bad from a certain domain name and if any of your users are using let's say windows they click on it, they will become infected and you can't stop them because we are all just barely evolved monkeys and we will click on anything that moves you can make sure that the DNS won't work so that when they click on it nothing will happen for them I hate that we have to do it this way but since we have to do it this way I've decided to make it possible, make it easier and to launch yet another multi-billion dollar industry to monetize in any way my wife has asked me questions about that too so we have some time for questions, either about Robert's stuff which I think is really interesting or my stuff which I think is sort of interesting how many minutes? we've got 12 more minutes so this room is much more full than I expected because the EFF party went until 8am so you must not have been there and they're talking about these cool badge in some other room and you really thought all of you would be in there listening to that maybe there was no room anyway I'm hoping that there are questions we have time he asked about covert channels via the DNS we do notice what appears to be covert channels they're conducted over the DNS and these are really annoying for us so when we find them we tend to blacklist them so that they don't enter the database because they obviously have the potential to evade the cache because all of the names and data are unique so we don't want to take anyone's YouTube downloads via DNS and shove them into a database we haven't really tried to discover covert channels using the passive DNS using the raw data itself as I mentioned I'm a not security researcher I just build tools that security researchers can use so I'm afraid I don't have any really hot examples for you any other questions wait a minute I want to follow up on that answer so we've described a database that was built on a bunch of passive DNS stuff it turns out that the raw data was our original product the security information exchange is an ethernet switch full of UDP broadcast traffic so all the data that we get from our sensors we play as UDP broadcasts on a sort of an ethernet switch that's in our data center and then we listen for that on another port on the same switch and we invite other people to come plug their computers in and listen to that so it's basically a tap point on our raw data stream we not only take the raw data but after we've run it through various filters like deduplication or this QR stuff or various filters are also touching down on VLANs on this private SWDI who is in our data center and plugs a computer into our switch is able to see not just the raw data but as we process it and rebroadcast it we can see it as it gets more and more refined the intermediate products, thank you so we hoped that that's all we would have to do and that somebody else would come build this database in fact I hope Florian would come build a database because we have a lot more data than he had and he was good at building databases but he's kind of on to other projects by now so we have built a database but if you really wanted to look for covert channels the raw data is right there so if you're a commercial entity we will charge you a port fee on the other hand if you're a university researcher or hobbyist or whatever and we can vet you we will probably waive the port fee because we want this data to get used so as far as covert channel detection the fact that we're not doing it should not cause you to think it cannot be done we have more passive DNS data than anybody and we are going to keep doubling it every year until Moore's law explodes more questions about anything we have a bulk query oh sure we asked if there was a automated queries and we do have an interface for doing bulk automated queries it's a simple HTTP REST full type interface you construct a URL and download it and it will give you back either JSON or some binary format converted to an ad hoc text format but yes we actually implemented that bulk interface before we did the web interface and if you were and at some point we'll be able to allow external entities to query that interface any more questions that's a good question yes when it will be available we will probably start up a beta period right now and we're interested in reaching out towards people that are interested in hacking on a beta type interface following the back he asked what is in the 50% of the data sorry he asked what is in the 50% of the data that we throw away the big thing that I noticed when I was building this is spamhouse.org because it doesn't follow the usual distribution of data that normally gets a typical you'll typically see about 80 to 90% cash hit ratio for your typical DNS server but certain zones in particular generate a huge number of unique queries spamhouse is a very good example because it's a DNSPL and it's the key that's been queried as an IP address and botnets are constantly generating all of these different queries and then once I had done that I had implemented a blacklist that allows us to exclude certain domains and subdomains from further processing and then I started sorting by the domains that were generating the most data and I found things I believe Yahoo that puts chucks a ton of address records into a query and there's this combinatoric explosion of different possibilities and that was such a large volume that I blacklisted that particular zone that they used as well oh Facebook, that's a good one Facebook has a chat feature now they have this web chat thing you can instant message your friends and the neat thing about Ajax is that the web browser developers decided to implement a limit on the number of persistent connections you can make and to get around this web developers have discovered that you can use a wild card C name record and use different names for your server so someone at Facebook decided to use this in order to make multiple connections because you could have tons of Facebook profiles open in your same browser session multiple identities and the fellow who wrote the code that generates the new name to use decided to use a 10 digit random number so we ended up with I think around 5 megabits continuously of these Facebook records that were being looked up they were totally useless and they had a 1 hour TTL so I can't imagine what's happening to the recursives that are picking out all those records so we blacklisted that huge proportion let's see the other types of filtering that we do is we are not really interested in generically named pointers records so your ADSL post 192 whatever, bellsouth.net well there's tons of that so we use something called the enemies list to filter that and there's a very nice fellow by the name of Steven Champion who maintains approximately 50,000 regular expressions that we use to filter that data and that accounts for about 50% once we do a very aggressive caching what was your follow up question TTL is not a very strong part of the uniqueness of an RR set so we don't actually store that in certain of our intermediate products we do maintain the TTL but we don't actually insert that into the database because it can cause a lot of copies of the RR set to occur this fellow the forward look he asked if we had the ability to look up quad A records in the forward interface and the ability to look up IPv6 prefixes in the reverse it's fully generic all possible DNS RR types, it's just an integer and the web interface has some cute little drop downs to select the most commonly used type and the reverse search is done based on a byte prefix so we can support IPv6 prefixes, IPv4 prefixes we can start doing name prefixes and other generic data I believe we're almost out of time do we have a okay, more questions yes yes sure you can't search for regular expressions but you can certainly search for exact matches or prefixes questions about the sensor network it's true that we can't name all of the parties who are giving us data but on the other hand some of you should be giving us data this is all open source stuff for a good cause so take a look at our website sie.isc.org and find out if you should be running a sensor some very large ISPs are running sensors, some universities are running sensors, I love it when we get a sensor that covers an underground dorm complex because those people really will click on anything but we always need more I'm not going to miss this opportunity to do some outreach so if you have any data please share it not just recursive servers, if you have authority servers we can take your queries I don't care about your answers because those are predictable based on your own content but I'd love your query stream on authority servers I'd love your query and response stream on your recursive servers we're going to open up sie to other data formats soon we're going to be looking for NetFlow darknet who is since we've got somebody from Aaron here it's all kinds of stuff that should be shared in order that we can cross-correlate Andrew Fried last year gave a talk about how he was able to use sie data and the passive DNS and our spam trap feed to do cross-correlation and find botnets so if you have spam traps or you might be willing to start up a spam trap talk to us and we'll help you sort of extract all the interesting body URLs and format them in the format we need and then we'll share them with the security community so the sensor network is not public because people don't like to be thought of as targets they don't want to paint targets on their back but we won't tell anyone who you are if you're sending us data so please send us data to have some concrete numbers we're done, alright