 Please give a warm-up house to Tobias Fiebig who will be talking about IP version 6 global scanning. Thank you So good evening everybody nice that you still made it even though it's 11 p.m. Of course, this is not Work of me alone. It is joint work with colleagues from UC Santa Barbara Kevin Bogolto, Shuanghao, Chris of Krugel and Giovanni Wigner First thing about me I have a Bachelor of Science in cognitive science And somewhat noticed that wetwear is not really my life So I actually went on to get a master's in network engineering and since I got that I'm more or less desperately trying to get a PhD from Professor Feldman and Professor Seifert it to your Berlin My research interest can be summarized as Basically what nobody else does but not because my stuff is ingenious, but because most people say well, that's stupid That won't work anyway and I think we all know this as misconfiguration so You all know these nice people that are running database servers on the internet without authentication I have heard of a big ISP that actually managed to run their provisioning system on the internet, which Was not that good for connectivity for their customers There's also other nice examples. So mitfarge liegenheiten DE was actually pretty good and doing backups They even encrypted them and stored the ASQ with it And there's always a fun of finding an ice-card the device That belongs to some bulletproof hoster read write on the internet And something I had to learn there is something called remote DMA what could be better than doing DMA over the internet So Basically, whatever I will tell you today Is directed at finding these? But first things first ethics I may tell you about a nice tool to have a lot of fun with But basically, what are you stoned or stupid? You don't hack a bank across state lines from your house. You get nailed by the FBI Where are your brains in your ass? Follow this good advice Think about this event think about the amount of bandwidth you have think about the amount of benefits other people have Think about them having this nice time between the holidays at the end of the year where they don't want to work Which might happen if suddenly DNS service start crashing and other services become unreliable so Back to my motivation and Some view of a related work The first part will be a little bit dry So if you start to get bored There's a special slide where I also push next to you to wake you up Just right before the fun stuff stuff starts. So please don't run away. So in the beginning There was ZMAP that was like three years ago And It starts this whole thing about we can scan the whole IPv4 internet We can do like internet-wide security evaluations and That was actually a whole whole lot of fun. So I think two years ago There was this nice service in one of the lecture halls at this event Where you would have something like chat roulette only was BNC service Yes, you love but actually brings us to the second point. It stops being funny when you find some Asset construction facility in India on one of these slides ZMAP was heavily used to scan for hot lead and to do analysis and how hot lead was mitigated ZMAP was used to scan for Well, at least open TCP ports where you could then do an Key exchange to see how is key sharing between HTTPS servers in the world There was research on amplification attack servers. Well, basically the amplifiers for the amplification attacks and this Ability to exhaustively scan Actually helps the security community a lot But as was all good things or rather not so good things in this case IPv4 is coming to an end There's only also many addresses two to the power of 32 in fact theoretically And there's a lot of work Investigating how this is getting more and more exhausted and by now getting an IPv4 addresses actually pretty hard So What happened is that people introduced IPv6 like 20 years ago roundabout think even more With IPv6 there comes a lot more addresses like two to the power of 128 theoretically And we can actually see that IPv6 is starting to get adapted so the graph I have here is Actually a bandwidth graph from this event But and we see that at peak points we At peak points we saw around 25% of outbound IPv6 traffic This is a little bit skewed because as soon as you deploy IPv6 to your user network You will suddenly see a lot of traffic to like two or three big players like Google with YouTube and Facebook But it still demonstrates that IPv6 is a thing that's coming However, if we start to use for example that map on IPv6, we will suddenly have to wait I Cannot really pronounce that number, but it's really really huge and I don't really intend on living that long so Let's come to related work. So Looking at IPv4 addresses was an interesting tool. It was not only used in security community But especially there and helped a lot there So people want to have the same thing for IPv6 Current work is relatively Focused on starting or trying to observe IPv6 address usage in the wild not doing active scans But having some vantage point where they sit where they can read something which provides some information about Used IPv6 addresses So for Romsky, Plonka and Berger They are using the access logs of a large CDN. Of course as good scientists, they don't disclose Which CDN this is but one of them has an echo my mailing address Six at all who are mostly the Vantage point freest of them all They are using various DNS data sources, but not the same as I will present in this talk and then there's Grusser at all They do have for some reason access to a large European IXP and Can look on flow data there and thereby can get information about utilized IPv6 address in the wild So let's quickly go over these so For Romsky Plonka and Berger actually provided the nice work where they use their data sources to train a tool Which could then predict another active that means replying to ICMP IPv6 address based on the data sets I had observed previously They are mostly using the CDN data They have a small really really tiny portion of data that was gathered as the data I will present on And they utilize trace route data This has certain drawbacks for example the CDN logs of course do not really represent a lot of well servers Especially not service in the wide internet out there Trace route data is by nature really really really bias towards networking devices because well, it's trace route data The work by grass at all in principle is somewhat similar So they also try to predict active IPv6 addresses based on a data set they have So then we have suits at all suits at all are actually Doing various Mining's on DNS data sets so they have Recursor data sets so they actually get logs from DNS recurses But they are also doing something really fun where they would actually do a PTR lookup for every address in the IPv4 space And then a do an quad a lookup to get the IPv6 address for that FQDN And and their work is actually one of the big motivations for what I'll be presenting today Because they figured out that IPv6 security is not really doing that well In other words is doing terribly But well so in Summary I looked at my basement. I didn't find a CDN I looked it off further, and I didn't find an IXP and when I asked my XP if ISP if they wanted to let me read their recursus access logs, they also said something including no Well, but on the other hand, so if I was a politician, I would now be like but we have this IOT This Internet of Things says we really need the scanning to protect all the people so Well, we came up with methodology by the way This is the point where you want to wake up people next to you that didn't really care about related work so first things first so There's another person I'd like to credit. It's Peter van Dijk I'm not really sure if I can do the IJ thing correctly because I'm not Dutch but I hope I Well, I tried anyway, I met him at last ITF in Berlin and as I well heard he's Dutch I believed him when he started to talk about DNS And he said if I want to get a lot of IPv6 addresses that are assigned to servers not clients I should actually look at well IPv6 reverse DNS He said he had some like interesting preview result thingies and I should start to dig into that So well as I said Dutch people know about DNS like most of the DNS servers were currently using come from the Netherlands Let's try it so Let's start with a short recap so IPv6 address as I said 128 bits and you can represent it in 32 so-called nibbles those are the little Things representing hexadec decimal characters. We see here And reverse DNS is a way to get for an IP address you have an FQD and a fully qualified domain name You do this by Doing some transformation to the IPv6 address. So you first reverse it And then you put basically points between each of the nibbles for different levels and the PTR tree And as top level domain and second level domain you have IPv6 dot R per and under that tree You then find a Record and that record you can do a PTR request to and then you get a FQD and from the DNS server So the next thing that is somewhat Important at this point is how DNS works so RFC 1034 I hope I get this right. I see somebody and the audience that will probably beat me up if I don't It's not really clear about it, but but tries to mean it so it got clarified and RFC 80 20 The meaning of NX domain Technically NX domain should mean if you receive the reply NX domain from a server For a point and a tree Like the one to the right That there's nothing at that point you requested or anywhere there under So what what this technology basically is is Descending trees on where we received a no error for the root node We can actually try this So we we have our given root here, which is IPv6 dot upper We do a DNS query for zero dot IPv6 dot upper We get an NX domain so we don't descend in that tree We do the same for the one we do the same for the two three four up to the e and we always get an X domain We know well, we don't have to descend that tree. Well when we have the F We do get a no error. We don't get any data back, but we get no error. So well we descend that tree Let's try I Never managed to clear our class. Sorry Anyway, we do the same thing again and this time we start with a zero And we immediately get a no data or no error So we can descend that tree already jump And do the same thing again, and then we descend that tree and by doing this 32 times So an IPv6 address there's 32 nibbles We can finally arrive at utilized IPv6 addresses And you see if we start with a zero This is basically a depth first search in this huge DNS tree that is spanned out by the reverse DNS name system So as I said, this is in itself not new you have RC 7707 which actually discusses this as a possible technique for network reconnaissance and smaller networks and They actually reference this Peter van Dijk who recommended this technique to me And he actually wrote a small block article about this like three four years ago And provided the Python implementation, which you can utilize to actually scan the network or rather DNS PDI prefix This was also the starting ground for the tooling which will be provided to you in the end of this talk first What I used to conduct these scans is Well, it's basically the compute server we have for doing machine learning at the research group And my colleagues were not so happy that they had to stop doing this while did my scans But as I produce something called academic code, which is basically like startup code what we don't call a production ready Everybody who loved works in a startup But but I'm pretty sure that somebody who is actually better with code than me So like a programmer who can actually write production code Will also be able to run this code. Well, the code they write themselves on their laptop so the first thing what when I thought about this technique was That that I do a depth first search Which is a pretty bad thing if the first part of the tree is on a really really slow DNS server on the other side of the world So you want to have the opportunity to do some kind of breadth-first search to paralyze over different DNS servers For this I basically just iterate in four nibble steps. So in a first step, I Will iterate up to a length of four nibbles for all possible FGDN REC over PDI entries Then I will collect all of these and for each of these do the enumeration step for another four nibbles Based on flavor on personal case one Why also opt to go from 16 nibbles to 32 directly so from 64 bit 228 bit Which may have advantages may have disadvantages, but we will see that later when we look at the data So this was basically what I built directly after ITF. I Hacked this together Went to the office ran it Was pretty happy because after I let it run for a week. I found 70 million records so I Then looked into the data and was wondering why I got this funny email from a large ISP With them complaining about a lot of traffic from my machine to theirs and I learned about a really nice feature you want to have in your reverse zones when you're offering reverse zones or rather networks to customers and end users Which is dynamic reverse zones? so I Have an example here. So when you have a big network and You don't want to manually set all those Two to the power of 64 reverse entries for all the possible users you could have You just set a script in your DNS server that actually generates a reverse FQDN for any given Well reverse pointer record They also may be static and I also found funny Dynamically generated looking domains that are probably DNS tons or something like that. At least they were always returning 32 bits of something random Point being in this case, you will never find an annex domain. So When your algorithm is just plainly stupidly going in there, it will find a lot of records, but they all belong to well However big this reverse this dynamically generated reverse zone is So I Thought about a heuristics And I first thought about doing something computational linguistics enhanced stuff where I would compare returned reverse Returned FQDNs to figure out if they may or may not be actually somewhat related Turns out doesn't really work doesn't really perform What performs for better is just just trying to query a static set of records and if at least three of these exist assumes a domain or rather the Subtree is dynamically generated So at least three records is a personal preference with a picking of Filling up the tree to a length of 32 nibbles with zeros ones two threes up until F It's a not so personal reference some people recommended to me to do this with Random data, but then you actually have to have enough non-blocking entropy for that. So well Taste question, but this actually works So using this I actually tried this again. So I started at IP 6 dot upper let it enumerate and then I found 1.6 million records and You have to say I started off with like 70 million I was like really amazed and now I only have 1.6 million records So I was somewhat confused Which brought me to another nice finding so their DNS servers that are for some reason not RFC 8020 compliant So they may actually send an ex-domain instead of no error We have an example here. So if the DNS server on which f dot IP 6 dot upper Is resident on Sends an annex domain because it has no explicit record at that point Well, then I will not see zero dot f dot IP 6 dot upper And to actually counter this issue I got the idea of seeding my algorithm So I do not start at IP 6 dot upper what I started various well-known pre-determined to exist Well sub trees of CD and S3 for IP 6 dot upper I Built a somewhat funny algorithm for that Well at each step so at each Nibble length remember the slide couple slides back So for four nibbles I would actually crop the seed record To a length of four nibbles and react the full record length again of the seed record just so that I get at each iteration All the information possible from my seed records The seed sources are used where the route views project and the ripe and CC How's it called Something with BGP we view I always forget anyway. These are publicly available And actually are documented in the script that will be published with this talk other possible sources are of course the algorithm This is at all used where they would actually do the quad a lookups for the returned FQDNs of IPv4 reverse pointers and Basically, whatever data said you can get your hands on So if you can for example If you are at a big event where a lot of people are using remote IPv6 servers And you were actually able to read the network traffic Which I recommend not to do Then you could also use that as source for seeding So with with the seed Set I actually ran the programs again. This time it took nearly three days running a parallel with a threads and I found five point three million records, which is sounding really small but due to the well state of IPv6 Deployment practices and IPv6. This is still pretty decent. So It doesn't show a full full full picture, but it shows a large subset But I was not really satisfied with the speed of this. So I tried to parallelize more Because I realized that my server was well, not really that busy. I Tried to run there was 400 threads It ran faster found less And the lesson I had to learn was that basically The one IP address I used for resolving ran out of sockets. So if you want to heavily parallelize this Do it do it on multiple machines With more addresses to actually use for outbound DNS queries Other thing that falls into this as if you are running this yourself have a local resolver Because latency to your local resolver for example in this network the resolver offered offered by network operations will Extremely increase the length of these scans and Possibly will overload them. So don't run it against your speed port So this is basically the start of further case studies First look into this You see basically plotted here the amount of queries I needed for records in slash 64 And we basically see a nice distribution between extremely structured zones. So zones that have colon colon one colon colon two Extremely structured and those that have nearly random addresses are possibly UI 64 addresses To the left are basically those where I need more queries to find less records and Then here are the structured ones which due to the depth first nature of the algorithm will also be found first that are then found was less queries But I promise more fun parts You can also utilize this tool to look at something specific. So From the huge data set you collect you can single out specific networks And this for example is an overview of how Software as a service provider Does their network assignment policy so so how IPv6 addresses and networks are assigned with that service provider? Figuring out which service provider this is is left as an exercise to the reader And I was told that this is too academic and just boring So I looked at something else, which is a little bit more fun So I personally do not really do a lot of stock trading But if you look at IPv6 networks People suddenly have a whole lot of addresses to use so they will address at everything They will address loop back addresses with public addresses. They will address if me with public addresses everything so if you look at such a network in this case starting here in September With with 65 K of hosts around the boat You suddenly see a huge jump of around 10 K in just a matter of well two weeks And like two months later You can realize that this as a provider actually announces record sales quarter. So This is one of the opportunities where you could use this tooling for actually gathering information on company growth But which is probably a lot more fun is looking at topologies So for me, there was the most fun thing I could do with this apart from security scans Where we all know what they will turn up So first thing I did was looking at an IXP by the way, I'd like to mention that This all is my personal opinion, and I'm not representing affiliations and so forth so I rendered graphs create our hosts Lua slash 64's red slash 48's and green r slash 32's And If you zoom out a little you can also see black ones black ones are hosts that are for some reason connected to 2 slash 64's or more so After importing this this actually looks more like a block cube than anything fun However, is it basically directed graph one can sort this Make it float apart a little so you can actually recognize parts of the structure You can add funny labels to it, then you can look at what you can see so this is a Network of a huge IXP This is for example Their pop in Hamburg Basically their peering network Of course, they also have a dependence in Frankfurt And nice thing you can actually see is so or black friend over there They can actually see the networks the customers are having behind the router Behind the peering at in Frankfurt They also have this famous black hole in infrastructure, which you can see being Connected to their provisioning And some internal routing or interconnect infrastructure and Firewall 6 appears to be their central firewall in Frankfurt With a whole lot of hosts connected to it for example an elastic search system for customer accounting And processing flow data But I think nerds are somewhat more into other networks So this is an example of NASA dot gov There's also somewhat bigger topologies one might want to look at Everybody familiar with dot mill so little side note I'm also doing the monitoring for network operations and we will now look into it at the moment the only network I Have the feeling that I know the topology of better than the network at this conference For reasons It will remain anonymous for the purpose of this talk, but technically the data sets are public so Let's have a huge journey to through a network So like all good networks it has a border router and some portal systems administration interfaces And possibly user access networks They also run like network infrastructure for example hosts called big brother Which is which is actually an ancient monitoring system I want to give them that Which they actually had to have a lot of service for but They are currently apparently migrating to Prometheus, which is a somewhat new monitoring system So let's look at their infrastructure. How do they run so? They run a lot of open-source software, so we see gentoo Kairos probably not so who and they also have a puppet production host Well They also have an Admiral Akbar which which made me think Which made me think that it might be a honeypot But what worried me most is actually this one on a little more serious note What does also teach us that these networks are run by nerds? so Some more fun things due to popular demand technical IOT the reductions happen due to well for some reasons it is not opportune to expose the IP fridges and military installations when you have collaborators from the United States Another nice thing we actually found in our data set is TCP 666 the number of the backplane a 666 in this case is a placeholder Because for reasons I actually talked to the C-cert of that company before this talk and they were like Hmm. Nice. Yeah, we kind of know about this and I was like cool. Nice. So I can talk about it and they were like Yeah, well with us So What when we are scanning some of the IPv6 addresses we collected we would find this TCP port being open on hosts that Would also expose talent SSH and BGP and somewhat look like backbone links Funnily enough the TCP port which should be on the slides was more related to a technology usually down-fined on a backbone router After some communication with a couple of friends. We figured out that technically this port should be bound to local hosts on the systems As it is used for some backplane services Funny side note on the vendor didn't even know how the customers managed to get this exposed So then we have these DHCP v6 We all know the issues of well devices that suddenly get IPv6 addresses and become then reachable my most favorites were printers and Something you see a lot is that these devices that are then exposed and that are really vulnerable Actually do have Forward pointer for the FQDN that is returned for the v6 address that points to an RFC 1918. So a private network address other things you find is So out of bound management in various forms what what you basically use when everything burns down One really nice nice large operator actually has all their IP me interface is reachable with ICMP version 6 Which which made me wonder what would happen, which I of course didn't do but well it's It's something I'm wondering about because well you don't really want your IP me on the internet and Also something for the people a little bit more into network infrastructure besides or exposed backplanes Besides or exposed backplanes You will also find a lot of rip out there a lot of BGP and talent as sage of internet infrastructure and internet backbone services Another fun thing to find is Docker One part is the nice things you can deploy with Docker for example elastic search It's really amazing who runs elastic search as a service without authentication only reachable via IPv6 But reachable was IPv6 for Docker instances and I don't know if you do know what these TCP ports mean for Docker installation This is actually an API Which you have to protect this information in the documentation how you can enable authentication There's actually good blog posts and how you can enable authentication using certificates and Enginix reverse proxy, but you can also just expose it to the internet Well So this is just a set of the opportunities you can have Which you shouldn't have So besides actually doing the right thing and doing firewalling for your IPv6 network, you could also do something well Security bar of security You could try to configure the reverse DNS zones in a way that always returns and no data or no error Basically masking them to this It feels hard to say that this is an attack but basically do this This was already available in 2012 when fundiac first played with this There's tools available not applicable to all DNS service out there But the concept should be clear so it should be implementable on the other hand We should think about techniques that probably are applicable even though This specific technique I just presented is not available So let's quickly conclude In the end you can try to reject packets, but you cannot hide your network topology You should think before setting PTR records and you should think before connecting things to the internet You can download my tool chain from the GitLab instance of my research group And there's an academic publication coming up in March 2017 Which is also dealing with this technique on a more measurement related point of view Thank you, and I'll be happy to answer any questions that are Thank you very much indeed First question as usually goes to the second angel Did you actually think about trying DNS zone transfer to see if the server would give up a bunch of PTR records due to misconfiguration Using other techniques like AXFR is discussed in the research publication, but we didn't actually try it Well There's a good link next question goes to mark for number five Yeah, yeah, you got me lost. Oh, sorry You got me lost a little bit here when you said that and maybe correct me if I'm wrong You should run your own local DNS resolver to get faster I don't know what and I'm kind of wondering did you use some kind of a synchronized IO? In order like to to parallel your resolutions Instead of using a local resolver or maybe I misunderstood what this was solving Okay, so so running a local resolver on the machine you're doing the measurements on Is basically solving the issue of overloading the local resolver of your research institution having your system administrator Standing in your office being really really angry You know all these things that happen when they suddenly have to deal with the DNS Recursor that is doing mostly the queries you are doing and not those of other people in the room or Research Institute. I see so it's not improving the speed It actually helps a little with speed because Either you do the resolving yourself or you put it somewhere else And you basically have to wait for the answers. So if you have a little bit of latency Towards your recurser The whole process will part up take longer if it's on your local machine You basically reduce one one small part of latency But again, you can probably do this better when you do the recursing and resolving from the tool chain Which I didn't do because I'm not a good programmer Okay. Thank you. Next question goes to mark for number one Okay, so did you actually consider that there might be a few servers hidden in the Those subnets who have these generated PTR records Yes, there may be servers hidden in there But due to the dynamically generated PTR records, I cannot verify this without exhaustively enumerating the whole well The whole tree which just gets really really large Okay, so you just there's like the I don't know and I cannot know. All right. Thank you Okay, next question goes to now mark for number two Hi from experience. I found that IPv6 reverse lookups are More sparse than the IPv4 versions. So do you have a reference on how many You hosts you find with that technique versus hosts that are there without reverse lookup I would would really love to have that I also so basically if you want to make the statement you you just Basically asked for and you would have to compare on this data set With another independent data set with which looks at another range angle of IPv6 for example large IXP for example CDN data set well CDN data set will be biased for clients and servers So most probably the IXP data set At the moment I didn't have availability of those so I didn't investigate that okay microphone number six You mentioned that results or the data set of your work are public. So where is it? Well, so it's actually stored in a distributed manner on a lot of DNS servers and As you know it from all cool platforms that there's a dedicated download client you have to utilize Download client source code is available at that Get up look at good luck location Okay microphone number five Hello So quick question most of the data on the slides were sort of Western And the dot mail exploration and so on so Just as a general question. Did you explore the IPv6? Oh, sorry Did you explore a lot of this on the Asian networks of say Korea China which has massive IPv6 use? Because of the latency thing, you know Actually, I didn't really focus on a specific per continent analysis I mostly Presented so far in the academic work the technique in itself and I picked interesting case studies That that would demonstrate the potential of the technique But what you suggest is actually an interesting point for further work. I would For anyone interested in that I would recommend Hong Kong as to the start seeing point to do the scans Cheers, okay any more questions Okay, thank you very much