 Good morning, and thank you for joining us today. We are excited to be here taking part in DEFCON China 1.0 This talk is IPv666 address of the beast. I am Mark, and this is Chris Hey, what's going on everybody? My name is Christopher Grayson. I'm originally from Atlanta, Georgia I went to the Georgia Institute of Technology a few times Was the head of the Georgia Tech Hacking Club I've been the head of the red team at snapchat, and I'm presently a security engineer at birdrides my name is Mark Newlin, and I'm a Hacker by formal or lack of formal education I have done a lot of work in the wireless security space using software-defined radio to do reverse engineering of wireless protocols And I've done a lot of vulnerability research at wireless devices at this point I have discovered and published vulnerabilities and wireless devices affecting somewhere north of 25 vendors My spare time I've done a handful of DARPA challenges placing top three and multiple of them I'm a former member of the red team at snapchat, and I now work as a security engineer at birdrides And so what are we going to be talking about today? And the short answer is the future and if you didn't know this Chris and I actually recently went to the future And while we were there we discovered that there were IPv6 connected devices as far as the eye could see Everywhere so we decided that we would build an open-source tool that you can use today to discover these IPv6 connected devices And we've been working on this project for about a year and a half now And in that time we've made a lot of mistakes learned a lot of lessons And this talk is a story of how we went from where we started to where we are now and how we built this Need tool and I want to point out that this is a pretty technical talk And we're going to gloss over some of the granular technical specifics And we're doing this for the sake of translation clarity and time All the details still in the slides and if there's anything that we didn't cover in super depth that you would like to learn more about Please find us after the talk and we are happy to answer any questions And I also want to point out that Chris and I are not network engineers And we've been learning about IPv6 as we go So it's possible that we've got something wrong If this is the case please correct us because we are here to learn just as much as we are here to share our experiences We're going to start with some background on IPv6 and our motivations for doing this research We are then going to look at the difficulties of scanning the IPv6 address space We are going to go over some techniques that we tried to discover these IPv6 addresses Some that work well, some that did not We're going to look at the latest iterations to our scanning algorithm We're going to then look at the latest iteration of the software Where we have the ability now to persist scan results to the cloud So that we can have a crowdsourced global data set of IPv6 addresses We're then going to look at some results from these latest scanning improvements Talk about the tool that we've released IPv666 and then have our conclusion So a bit of background on why we care about this work So here we have a plot of the percentage of users connecting to Google over IPv6 over the last 10 years So we have 2009 on the left, 2019 on the right And we can see that a decade ago there were almost no users connecting to Google over IPv6 This has grown over time and now in 2019 approaching a third of users connecting to Google over IPv6 And this is just a single internet company but it's representative of the fact that IPv6 is growing And as security practitioners it's something that we should be spending more time thinking about the security implications of And we got specifically interested in IPv6 as a result of some research we conducted a couple years ago We presented at DEF CON 25 with our friend Logan Lam A number of vulnerabilities that we discovered in Comcast, Cable Modems and Set Top Boxes And one of the big takeaways from this project was that a lot of these severe vulnerabilities Could only be exploited over IPv6 and to do so you needed to know the IPv6 address of the target device And to illustrate this I'm going to talk about one of the more interesting vulnerabilities So if you're a customer of Comcast you have a Comcast Cable Modem You have the administrative web user interface you can access on the local network And there's a separate administrative web UI that can only be accessed from Comcast Via a specific IPv6 address from a specific Comcast network segment And we also spent some time looking at the Set Top Boxes And there's a service called Send To TV And if you're a Comcast customer you go to this website You put in a URL, you hit go And it displays that website in a web browser on your TV We discovered that these Set Top Boxes actually exist in the same protected network segment As the Comcast customer support agents And this meant that we could take the IPv6 address of any target customer's modem Put it into this Send To TV tool And actually load the administrative web UI of their modem on our TV And then sign in with hard-coded credentials So this meant that we could actually remotely administer any customer's modem As long as we knew their IPv6 address And this was our first hint that knowledge of how to discover these IPv6 addresses Could have some pretty interesting security implications Yeah, so that was quite a fun project All of our research for that project is available on GitHub if you want to take a look But so we do this, we have a lot of fun with it And we start thinking, oh you know what Maybe we should dig into IPv6 a little bit more And so we kind of came away with a handful of different things That left us scratching our head thinking like, huh This seems like it might be concerning as far as security goes And so one thing is that IPv6 works out of the box Without any explicit manual configuration So in IPv4 you typically are relying upon DHCP servers To give you your IP address You don't need that in IPv6 anymore There's an improvement called Slack The Stateless Address Auto Configuration Protocol Where if you speak IPv6 If you have the ability to speak IPv6 And your networking equipment can speak IPv6 Then you can just provision yourself an IP address Not only that, but all of your devices and networking equipment Both support and prefer it Assuming that it's modern equipment So this is your phone, this is your laptop And this is your gateway And so that means that A, hey Your address is going to be provisioned automatically And B, if that is possible Your devices are going to prefer speaking over IPv6 Wherever possible Third, there's no such thing as private address space anymore In IPv6 for the most part There's this thing called unique local addresses But whereas right now when you go home And you plug your laptop in Or you connect to your Wi-Fi network You typically get like a 10 dot star address 192 dot 168 dot star address But you're on a private network And that means that folks from the open internet Can't start routing traffic to you Without you opting in and doing a NAT punch through That's going away now Basically we're going back to the world Where we have to rely upon firewall rules To prevent access to your devices We kind of found out When we first started looking into this Mark is sitting at home on his couch I'm sitting at home on my couch And he was able to ping my Chromecast That was on my network And he was kind of odd to us So your IPv4 firewall rules also don't apply So if you even thought you know what I know about IPv6 I'm going to check my firewall rules And make sure that I'm blocking all incoming traffic And you do IP tables dash L Well it turns out that that has nothing to do with IPv6 There's actually a separate list of rules And a separate utility for managing them It's IP6 tables Otherwise it looks entirely the same But just because you have your IPv4 firewall configured With everything That has nothing to do with IPv6 And so somebody with Somebody coming from a red team background My entire understanding of ICMP The internet control message protocol Is that it is for ping scanning hosts And so you know It's commonly people asking How do I block ping How do I do this How do I do that And one of the pieces of knowledge That is given away in this Is just block all ICMP traffic Well it turns out that you can't really do that In IPv6 ICMPv6 is a critical protocol If you block it everywhere You are not able to route traffic So in IPv6 The address resolution protocol Has been turned into the neighbor discovery protocol Which is packaged into ICMPv6 There you go It is a critical protocol And then lastly There's no more notion of broadcast in IPv6 In IPv4 There's broadcast And there's this thing called multicast Which is basically I take a packet And I send it to an IP address That makes the networking equipment Propagate it to a bunch of different endpoints But it was largely unsupported And unimplemented and unused IPv6 only has multicast And if you are too spec With IPv6 Then you support it out of the box So we read all of these things And we're like huh It seems like it might be concerning We should totally go test this hypothesis That IPv6 security posture Might be worse than IPv4 And then we ran right into The problem that the rest of this talk Is going to be covering So as Chris mentioned We want to validate our hypothesis That the IPv6 security stance Is potentially worse than IPv4 And to do this We need a number of IPv6 addresses To look at to see what these devices Are actually doing And we quickly run into this problem of scale IPv6 addresses Which gives us just shy of 4.3 billion addresses And this is a lot of addresses But you can nonetheless have a single computer Do a TCP connect scan on one port Across the entire IPv4 address base In an afternoon With IPv6 We go up to a 128 bit address space And this is a much, much larger Number of addresses So on the bottom of this slide Is the total number of possible IPv6 addresses And I have no idea How to even begin pronouncing this number It has 13 commas It has far more addresses Than ever possibly hope to scan So we need to come up with creative ways To find these potential addresses To go and scan And then we have something called pslack Which complicates this So originally there is a protocol called slack The stateless address auto configuration protocol Which is used to generate an IPv6 address And we can think of an IPv6 address As having two components We have the network bits Which are commonly the lower 64 bits Of the address And these represent the network That the device is connecting through The network bits Which are commonly the upper 64 bits Of the address And these represent a unique identifier For that host With the original version of slack The host bits were just a direct transform Between the MAC address Of the network interface To these host bits And this transform would be the same Regardless of what network you were on So this meant that if you connected Your computer to the internet Over IPv6 at home You would have the network bits From your home internet And you would have the host bits That were related to your MAC address Then when it connected from a coffee shop Or from work Or from somewhere else And you would have a different address Because you would have different network bits But your host bits would be the same And this meant that website operators Could actually track you As you went to different locations And connected to their services Because these host bits would be the same So pslack Which is the privacy extensions for slack Introduces pseudo random entropy To the process of generating these host bits And this is good for privacy Because it means you can no longer be tracked As you connect from multiple locations But it's bad for us Because that high entropy means it's difficult For us to do any kind of probabilistic modeling To predict what these addresses are going to be So this means we have to really break this down Into two separate problems We first have the problem of identifying The high entropy pslack addresses And then we have the problem of identifying The lower entropy non pslack addresses So in the case of the high entropy pslack addresses We decided to get creative and see if we could use Honeypotting techniques to instead of predicting What these addresses would be We just get these devices to connect to us instead And so we started by setting up a web server A DNS server and an SMTP server On a cloud instance that we spun up We used a few techniques to get traffic To these services And the idea was we could start collecting This list of IPv6 addresses So we started by setting up a Honeypot DNS server Which was just a bind server And at this point we thought we needed to be clever And we assumed that we would have to funnel Connections that were coming in on IPv4 Over to IPv6 What we realized at the time Is that on modern operating systems If a device can connect over IPv6 It will automatically prefer IPv6 in most cases Turns out that all the documentation we read was correct And so we ended up doing some extra work We thought we were being clever but it was just Extra work that didn't really notice anything So here we have a plot of DNS requests Over time to the service And we can see we have a few spikes here And these spikes correlate to campaigns we ran With an ad network called PopAds Is an ad network where you give them money And they drive profit for your website And we selected PopAds because they were the cheapest Ad network we could find And so here we have a plot of some traffic From PopAds over a course of two minutes And in this case we gave them ten or fifteen dollars And got between forty and fifty thousand requests In two minutes And you might think how do you get Forty or fifty thousand real users to click On your website in two minutes We do know that the referring page For a lot of these requests was just a blank page To get a job as a script payload So it's tough to say but maybe not Hundred percent loaded in traffic So then we set up a Honeypot web server And this was initially running at IPv6.expose which is where our web portal Is up now And we were serving this over both IPv4 And IPv6 The idea was if the page was requested over IPv4 We would have images served over IPv6 So that we could get IPv6 addresses Even from IPv4 clients And again this was the case where it was not Because everything just automatically connected Over IPv6 And we also had a web RTC JavaScript payload On this page which would enumerate The private IPv4 addresses of the client As well as their IPv6 addresses And post those back to a web service we controlled Now one thing I want to note here We tried posting links to this all over social media It turns out we're really not that popular And so we didn't get any traffic from that So this again was the case where we used Pop ads to drive traffic to this So here we have plots of the number Of access log requests from the web server On the top over time And then from the HTTP post backs on the bottom And this is over the course of around 10 months And so again we have these spikes where we did The pop ads campaigns Now on the top the access log requests Are on the order of tens of thousands But the post backs are on the order of thousands So this tells us that while we had a lot Of requests to the web server Most of those clients weren't actually executing The JavaScript payload and so it again Limited the quality of data that we received And then we tried to set up a Honeypot SMTP Server and we thought we were being really clever We thought we would post email addresses To this all over the internet We would sign up for spam lists And our hope is that we would have a lot of DNS Resolutions and a lot of SMTP hits From these email addresses And unfortunately this was just a major bust We had a small number of hits From infrastructure email providers like Hotmail And Yahoo but nothing terribly interesting With the SMTP Honeypot So what does this Honeypot results look like So in the course of ten months We generated around 90 or discovered Around 92,000 unique IPv6 addresses And in practicality We ended up spending closer to a thousand dollars We kind of lost focus on this Because it was a slow process It was expensive The results weren't as promising as we wanted But it was still a number of addresses That we discovered So after we collected this set We decided to do an ICMP ping scan And see how many of these hosts were still live This May, almost none of them were up And this led us to the discovery Of something called ephemeral IPv6 addresses So just as we have ephemeral UDP and TCP ports We have ephemeral IPv6 addresses And in these cases outgoing connections Will have a unique ephemeral IPv6 Address per connection And so most of those 92,000 addresses That we discovered were actually not live And pretty pointless So now we've spent ten months A thousand dollars and have basically nothing And so we decided to switch gears And start looking at some different techniques For predicting these lower entropy Non-P slack addresses Yeah, and it's The hallmark of a good research project When you spend a lot of money And a lot of time for absolutely no reason That's how you know you're going in the right direction So we talked about the P slack addresses Let's talk a little bit more About the ones that don't have a lot of entropy So we started looking at these addresses This is a sample of addresses In the public data sets And just kind of like squinting your eyes You can see that there's some structure In them And this was kind of our hypothesis Was that looking at these you can see That there's a number of kind of byte boundaries That people are commonly iterating upon You have like colon, colon, one, colon, colon, two So clearly devices are getting Least IP addresses from DHCP servers There must be a lot of structure In these And so what does any self-respecting Professional do and pose with a problem With data? Machine learning, obviously This is the solution to everything So not only Are we not network engineers But we're not machine learning experts either And so Basically we worked with A friend of ours that is a machine learning expert And he told Us that we should build an auto encoder And so I'm going to explain this In the way that it was explained to me That's not the way the human eye works Light bounces around, it bounces off of objects It goes through the lens of your eye And hits the retina on the back of your eye And your brain interprets that As your vision So if your lens is damaged Then the data that is processed On the back of your eye is imperfect It might be blurry It might be the wrong shape But the point being that when the lens is damaged You have an imperfect image That goes through on the other side So the auto encoder Is very similar in that You basically build this lens And then when you push data through this lens It changes the data So it transforms the data a little bit But the very interesting part about an auto encoder Is the way in which it changes the data And the error in the lens Is actually representative Of the structure of the data That the lens was built from So the idea is we take All the public addresses that we Know about from public data sets We build this auto encoder And then we take the addresses, put it through it The auto encoder changes Those addresses in a way that is representative Of the structure of the IPv6 Addresses that we trained on And then we have new addresses that we can scan And of course, this worked Completely well Just kidding, this did not work at all So we basically Could only make a completely perfect lens So that we Could take a dataset and pass it through the lens And then the data that we got Out the other side was the exact same As what we had passed in So not particularly useful Fairly discouraging We'll throw this on the pile with the thousand dollars And the p-slack stuff But we're like, okay, we have to at least Be on the right track here And so then we found this paper Which is the entropy IP paper And this is from a group of researchers At Akamai At the address structure And on the order of billions of IPv6 Addresses that were traversing The Akamai network And the conclusion that they came to Is basically there's a lot Of structure in these addresses And so the graph that you see up here In the top left-hand corner That's actually mapping entropy by bits Of the IPv6 addresses that were analyzed And so you see on the right-hand side You have very high entropy That makes sense, those are the ranges That are being iterated over In the middle, you have very high entropy So that's the slash 64 boundary So you would expect there to be a lot of Networks provision there, but everywhere else There seems to be a fairly low amount of entropy So you're like, okay, I feel like We're at least at it in the right direction What can we do to solve this problem And we do what we're really good at We got really dumb about it So here's what we did Because we basically just needed Something, anything that works And then the address scans for them Takes addresses back, feeds them back in And then continues in this loop So we came up with this very simple scheme Where we take an address And we break it down into its 32 Constituent nibbles A nibble being four bits And then we count The occurrences of Nibble pairs So we say, okay, in position zero When the current value is OX2 The next value that we see Is OX8 one more time Okay, in position one When the current value is OX8 The next value is OX0 one more time And we do this for every nibble Of every IP address that we have in our data set And we end up with a probability distribution On a per nibble basis For predicting what the next nibble will be Based on the value of the current nibble And with this To actually predict an IP address We would start with OX2 And we would say, okay What is the probability distribution For all of the nibbles after OX2 In position zero that we've seen before We would get all those probabilities We would create a weighted die And we would roll that die And whatever comes up Is what we would go with for the next value So now we have a value for position one We would do the same for position two Position three, so on and so forth And we get an IP address out of it And so we do this And another word of warning Whenever you're doing research And the result seems too good to be true It's probably too good to be true So we generate 10 million addresses In this way And we do an ICMP scan Across all of them And 50,000 of them respond We're just like, wow We're really good at this We're pretty good I can't believe that worked I guess we're just really good at this False So this is when we learned about These things called alias networks And what an alias network range is And to this day we still don't know Why these things exist But it's where every single IP address In an IPv6 network range Is mapped to a single host So there might be two to the 96 Two to the power of 96 hosts In an address range And every single one of them Will respond to an ICMP Pink scan A probabilistic model and a feedback loop That scans and then takes the live addresses And feeds them back in And then scans some more You very quickly come up with a completely useless tool Because it's just really good At finding these sorts of ranges Because they have such a significant representation In the dataset So we had to figure out how to detect These alias network ranges And so here's how we did that So we take that address It responded to a pink scan And we say, okay I'm gonna wrap you in a slash 96 network And then I'm gonna generate Eight addresses at random In that network And then I'm gonna Pink scan all eight of those So the idea being that, okay We think that this address might be in an alias network range We're gonna take a network range around it Generate a bunch of random addresses In it because if it isn't an alias network range Then these are going to respond And the likelihood that we guess Four out of 4.5 billion Live hosts at random Is pretty small So we ICMP scan And if 50% of the addresses respond We know that that slash 96 Is an alias network range But the network range might be bigger Than a slash 96 So now we need to find the boundary Of where that network range ends So we take this address We map it to its bits And we know that the right most 32 bits Are within the alias network range And the other 96 bits We don't know So we're going to do a binary search We're gonna take the right half of the bits That we don't know about And flip them And then we're gonna scan that The idea being that, look Every single address in an alias network range Is going to respond So if all of those bits are in the alias network range Then this address will respond too So one of two things happens then Either we don't get a response And in that case The left bits That we didn't test Are not in the network range And the boundary exists Within the bits that we did flip And in the case where we get a response It's the opposite So the bits that we flipped Are clear within the alias network range And then the bits to the left of it Are the ones that we don't know about And then we rinse and repeat We do this five or six times And you actually find the exact barrier Or boundary for where the alias network range Ends And then we blacklist that network range And remove any addresses that are in our dataset That are from it And so that is where Our first kind of iteration on this project Ended So we would generate these addresses Scan for them And then the ones that responded To the alias network detection Remove the ones that were in alias networks Take the results, put them back in the model And keep going And it works reasonably well So we could find one new Novel IPv6 address That we hadn't seen before Once every 15 seconds So for a minute Not great, but at the same time That's better than guessing Out of two to the power of 128 So this is actually the fourth time On this topic, and we don't like Giving the same talk twice So every time we give a talk We add a little bit more to it We improve it a little bit And one of the more recent updates That we did was about getting less Dumber, hard for us to do But we tried And we wanted to focus on having A better address discovery rate And when we're looking around We found this paper called 6Gen Which is made by these folks It's basically a further iteration Of the entropy IP paper Really interesting stuff But the coolest thing That really resonated with us Was their notion of an IPv6 address cluster So in our previous probabilistic model We only could have causality Between two adjacent nibbles Because we're only checking to see What is the probability distribution Of the next nibble every time That we're looking at it And there's basically more causality Between nibbles that aren't adjacent They can be on almost completely Other sides of the address So these guys define an IPv6 address cluster As an IP address And a set of wildcard nibble indices And I'll explain more about What that means in a second When they're evaluating How good a cluster is They consider two things One of which is the capacity That's how many possible IPv6 addresses Are there in this cluster And then density Of all of those IPv6 addresses How many of them are in your input data set So let's walk through What one of these clusters looks like So we have the same address from before And we say, okay We're gonna create a cluster of size one With no wildcard indices And that's it It's fairly uninteresting Because it's just a cluster of size one It can only have one address in it And we say, okay, you know what I want to upgrade this cluster To contain this other IP address So when we're upgrading a cluster To contain another IP address We break down both IP addresses The one that the cluster is based off of And the one we want to add And we say which nibbles are the same And which ones are different And in this example There's only one nibble that is different It's right there in the first index And so we say, okay In order to upgrade this cluster So that it contains both of these addresses We have to add a wildcard index In that position Because the wildcard matches any nibble The rest of the nibbles are the same So now this is the cluster that we would have If we upgrade it to contain both And in this case Because there's only one wildcard index And there's 16 possible values for a nibble The capacity here is 16 There's 16 possible IP addresses in it There's only two data points That we use to feed into it Our data set is of size two Which gives us a density of 12.5% And just to drive this home This is all of the IP addresses That exist within that cluster So the original algorithm I'm gonna skip over a bit But another note on research projects If you're ever reading a paper And the paper has one section That says algorithm And then the next section says Optimizations for how you can get this algorithm To run on your computer It's probably an expensive algorithm It's probably not gonna work in your tool all that well This is that case One thing to note It's quite expensive But we had Basically we based our work on this So without this work Without what they put together We would not be standing here in front of you Able to say that we accomplished what we did Here's the idea behind how we use these clusters We take our Initial data set And for every IP address in our data set We create a cluster of size one So we have as many clusters as IP addresses when we start And then for every single one of those clusters We were asking the question Hey, what would the best Wildcard index to add to this cluster be? So we say, okay for this cluster What if I put a wildcard index at index zero How good would that cluster be? What's its density? Okay, how about index one? How about index two? How about index three? So we end up with For every cluster that we start with Here are the, here's the best upgrades For every single one of them And then in some cases There's no good upgrades And that kind of means that this IP address is just Hanging out there in the ether Without any really adjacent neighbors There's nothing near it And we take those clusters That have no adjacent neighbors So with those to the side We have our set of clusters We have our set of potential upgrades We sort the upgrades by density We want to take the best upgrades first And then we take the upgrade off the top Put it into our cluster set And then recalculate what its next Best upgrade would be And put it back on the list of upgrade candidates So we take one off the top Figure out what its next step would be Put it back on the list of potential upgrades Rinse and repeat We have a scoring system For our cluster set And basically we recalculate the score Every time we bring a new candidate Over into the cluster set And what we see is when we start The score goes up, up, up, up And then it hits a peak And then it starts coming down We want to find the model when it's at that peak So we just watch the score And once the score starts decreasing We say this is the best possible cluster set That we can build That describes our data set With hopefully high density And the green here shows that it's a better algorithm Than the previous one So once we have that data How do we turn that into IP addresses? Well, when we want to generate an IP address We take a cluster at random From our data set And then for every one of the 32 nibbles We flip a coin If the coin comes up heads We generate from the cluster If the coin comes up tails From a probability distribution Built from all of the addresses That we did not include in our clusters Those ones that we put to the side When we said these don't really have any neighbors So in the case we were generating from the cluster It's fairly simple If the nibble at this index is a wildcard index Then we just pick a random nibble Equally weighted If it's not a wildcard index Then it's the value of the IP address At that nibble from that cluster And if we generate from all the addresses From the cluster model Then it's just a probability distribution Of what are the values of the nibble In this position that we've seen In all the rest of the data And so that's how we make better progress On generating the candidate addresses That we want to scan And now Mark's going to talk a bit About how we improve upon fanning out from them So as Chris mentioned With the improved 666 gen algorithm We're able to improve our scanning coverage And generate a good set of addresses In the probabilistic model And now we want to use these newly discovered addresses As landing points and fan out from there And what we mean by that is Once we've discovered a new address We can assume that there are potentially other addresses That might be live that follow a similar structure To that address And we can also look for neighboring networks And neighboring hosts depending on the specific addresses We found So for the nibble adjacent fan out Which is how we look for addresses With similar structure but different addresses At a times of 4 bits at a time For the discovered address So in this example we have a target network Of 2000 slash 4 The slash 4 tells us that the first nibble The first 4 bits is fixed So the 2 is always going to be the same And then for the other 15 nibbles We generate 15 addresses each For the 15 values that are not present In this address So what that looks like is we take the last nibble In this case it starts with a value of 1 So we generate an address Of 223 and so forth Until we've generated 15 new IPv6 Addresses from that initial address By varying that 1 nibble And we go down and we do this for each of the 31 nibbles that are not fixed in that Target network And in this case we have 31 nibbles 15 new addresses each We've generated 465 new candidate addresses From that one input address And we like to do this because scanning is Very very cheap, it's very inexpensive Computationally to scan addresses So once we've found an address we want to say Okay, are there other addresses that are Similar to this but have basically the same Structure? Then we also do a sequential fan out Looking for both neighboring networks And neighboring hosts And one of our observations is that networking Equipment, especially consumer premises Equipment like cable and DSL modems Will frequently be assigned an IP address Of a slash 64 colon colon 1 And so you'll have a router which has A slash 64 and the first address Right 64 bits is the router address So when you find one of these colon colon 1s We then look for incrementing And decrementing neighboring slash 64 Networks, so we take those Lower 64 bits and we count up And we count down monotonically And the idea here is that if I'm an ISP And I'm allocating addresses to my customers I am likely to do This kind of sequential address assignment And then we do something similar for neighboring hosts So we have a colon colon 1 address On a slash 64, this is common To see in a modem which will then Assign DHCP leases to clients Monotonically increasing from there So when we see the colon colon 1 We then generate the list of addresses Of colon colon 2, 3, 4 and so forth And this allows us to find potentially Neighboring addresses from this initial input set And with this we've been able to Really answer this question of is us Any smarter, have we gotten any less Dumber, and how much have we Actually improved our scanning algorithm And it turns out that combining the Data sets we've been looking at Is worlds better than our previous implementation So with version 0.2 Which our first real good attempt We were pretty happy with this We were finding around 60,000 addresses Over the course of eight days These were 80% of them were distinct Not in previously seen public data sets And this was pretty cool But then we got this version 0.3 rolled out And this was a huge improvement So now we're able to find In this case 1.57 million addresses Of an hour, 78% of them were Not in the previous public data sets This is a 503,000% improvement Over v0.2 And so we're pretty happy with these results And so we decided to take a sample set Of 100,000 addresses from these Discovered lists and we did TCP connect Scans on a number of common ports And the objective here was to see Are these live hosts, is there Some false positive situation going on And we had a lot of open TCP ports Which tells us that these were Live hosts We've seen lots of networking equipment Both infrastructure and consumer premises Lots of no-off MongoDB instances Ancient SSH, ancient telnet And so this tells us that we're getting closer To being able to look at this hypothesis We started with that the IPv6 World is potentially More security risky than IPv4 And we've been working on this project For a year and a half and we started With this one objective in mind And it's taken us a year and a half To get to the point where we can Get to you, so it's pretty exciting That we're at this point now and we're looking Forward to continuing this work And now Chris is going to take you to the cloud To the cloud, up and away So as I said before, this is the fourth Time that we're speaking on this topic We don't like to give the same talk twice And every time we talk We put something new into the code base So last time, last iteration We made a much better scanning algorithm And scanning process and it was pretty cool We found 50,000 addresses And then we're watching this run It just found 150,000 in like Five minutes, that's amazing And so we did a little bit more analysis And it looks as though this can Scale out horizontally pretty well So we can run two instances of it And they will find fairly unique And independent addresses from one another So that puts us in an interesting situation Where this is really good data To enable folks to kind of like Get the data on their own And then potentially aggregate it with us And so we put functionality into the tool Where when you run it the first time It prompts you and says Are you okay sharing the IP addresses that you find And if you say yes, it just uploads the addresses As you find them and it puts them on our web portal Which the link is not up there right now But it was up there before It's ipv6.exposed And so the whole point here is To basically aggregate this data set Across anyone who is comfortable running it And sharing data with us And then enable users of this web portal To query the data so they don't even have to Run the tool on their own to kind of make use Of the data that we're collecting So if this sounds interesting to you Please take a look at our tool Try running it, take a look at our data set We'd love to see what you think And so kind of to close things out for me This is where you can get our tool If you have Golang 1.11 or higher installed Just do go git and then that github URL My handle is lava lamp I'm not fast enough to get just lava lamp As my username anywhere So I put a dash after it for my github account You can find it there Also if you feel like contributing We'd love more contributors And hopefully you guys like it There's a handful of slides here You can find out the various utilities So you can discover new addresses You can scan to see if a network is alias You can generate addresses with the model that we have You can take a set of input data And generate your own model You can generate a blacklist You can clean a list based on the blacklist That's packaged with the tool And then you can convert the file to various IPv6 file formats This is a personal blog Again, a link to the tool And Mark's going to bring things home Talking about the reason we got into this project As an offshoot of our Comcast research And interest in IPv6 We looked at how it's really difficult to actually Scan the IPv6 address base Due to this huge number of addresses We talked about our very, very Failed attempts at doing honeypotting Our slightly improved attempts With doing probabilistic modeling And then our big improvements With the 6.6 gen algorithm We looked at our results where we can do 10 million newly discovered addresses We talked about our cloud persistence Service where we can actually Crowd source this dataset of global IPv6 addresses We looked at our tool IPv666 Which is available for you to use And we encourage you to do so And I just want to say that We've been working on this project for a year and a half And we've gone through a lot of failures We've learned a lot of lessons And we hope that you're able to use the fruits Of our effort and contribute this To your own research And we're happy to be here at DEF CON China 1.0 To the community. These are some links Of papers and projects that have inspired us We recommend you take a look if you're interested In this space. And now we have Just a few minutes for Q&A. Thank you. Cool. Well, if there are no questions If you see us, please stop us And talk to us for going to be around the con Most of the day. And yes, thank you all. Thank you.