 Okay, so a few months ago, we were listening to a talk about Dena's flag day coming up. And someone was talking about the experiences of running the ENI's flag day code over a couple of TLDs. So someone said, that'd be interesting for us at ICANN to try to do this with all of the TLDs that we have. And so this is a talk about the attempt to do this. Now of course, I won't have the same nice graphics that they had in that talk, then for one thing, we're after the flag day, so it wouldn't really matter as much. But it was interesting experience to talk about that. So to give you an idea of what this would entail was that if I actually ran every single test that was going to happen here, I'd have 11 billion digs executed. I didn't do that. I mimicked doing 11 billion digs for this. But that's the size of what we have. We have a lot of information. Now the code that I use, I know that as time went by, the EDNS test code that ISC developed, it changed over time. And when I met with the developer, Mark Andrews, back in October of last year, he gave me a file, a zip file of his stuff. I don't know what version it is. I just have his stuff. And as I worked on it, he kept saying, we should go to the GitHub and get my stuff that's now in C. And at that point, it was too late for me with all the work getting his stuff to work. I just committed too much. So I'm using a bunch of shell scripts that he had written to do all these digs out there, which is underneath everybody else's work out there. I know he said that he has more efficient C code. I didn't use that. So this is not really a judgment of this code. So why? Why did we go ahead and do this? ICANN has a contract relationship with about 80% of the TLDs. That means of the 1,500, so TLDs out there, 1,200 of them, I have access to their zone files as a result of that. That doesn't mean I have 80% of all of the internet usage of the DNS. It's just 80% of the TLDs because, of course, you know, some CCTLDs, which I don't have any of, are very big, right? And a lot of GTLDs... What domain you could get? I could, but I'm trying to, just to make it a little easier to pipeline this stuff. But also, on the other hand, a lot of GTLDs are very small. So the distribution doesn't say, I don't want to promise I have everything out there, but I have a lot of TLDs out there. So basically, we thought it would be good to run over these 1,200 TLDs out there. The other thing, though, is that I knew that at the start of, a couple of things I knew would not play well. One is, I wouldn't have a graph reach TLD. I don't want to do that. I'm not raising TLD versus TLD. It's not fair for us to do that. We don't have that as a thing that a great TLD is on in any way. Also, all of the alerting that Mark Andrews was doing, we don't have access to that. That's a whole other department of who has the contact out there. But it's interesting for us, as a protocol now, to look at the health of the system. So the expert, we got some insight into the eating of zero situation, not a whole lot, because frankly, by the time we got down to that part of the data, we had a lot of work and did a lot of analysis. But what's going on now is I've been building a platform for eating zone files, parsing them and pulling things out there and finding things. In fact, I've moved on to other projects already since I did these slides last week and I'm thinking of the new stuff already. One of the lessons with KSK Roller Project was that managing the Internet is pretty weak. It's hard to manage network. So we're looking, one of our interests is to make sure that, to see what we can do to manage through measurement what's happening out there, to assess whether this is being done, that's not being done, how well are operators linking things out there? Are our glue records good? Are the glue records good? But that's further research than this. So to start quantifying things, the workload we had, 1,128 zone files, 193 million delegations out of that, it's a lot. 450 million NS records alone through all of this. Glue records, 3 million mostly before and I will not go through all the numbers of course, it's Sunday afternoon and we're at work. Graphs, I don't have any, I'm too lazy. I'm not like Rory, I'm the opposite of Rory, I don't like graphs. I should but I don't. Now the problem here is I'd have thousands of graphs, one for each TLD, you're not going to go through that. On the other hand I think for most of this, any kind of graph that you see is a long tail distribution, just for the most part I'll tell you when it's not. And the other ground rules, I don't like to name names. I do have one really large GTLD we all know about and you, I'm not going to talk about it but I have a run really big one, everything else is roughly less than that. In fact the one big one I have, it takes half the processing time just to parse all of them together. In fact I did a race one time where I parsed it in one process, parsed all the others and they tied at the end. 1227 to 1, same process. Two-thirds of all of my results would it be in that one big TLD and so on and so forth. So things are kind of skewed. No CCTLD data in this just to make it easy. It's easy I could get the stuff but I don't want to mix apples and oranges in some sense. And the ARPA is the only thing in the reverse map that pops up here because we happen to have that. And one thing, when I get to the IP addresses I change them too because I know in Europe IP addresses are like illegal to mention I think. So nameserver names, 3.2 million nameserver names, almost none of them have IDN. Operators now still only work in ASCII, the infrastructure still pretty much an ASCII thing. Over 99% of all nameservers are just ASCII nameservers. That means the tools that get into IDN sometimes may be premature for the operators out there. I know there was a flap about one tool suddenly recognizing the native labels that screwed up scripts and I was bitten by that too. Most of the nameservers that I deal with are in GTLDs. It's not too surprising. But 85% of all the GTLDs nameservers were actually in a GTLD, the name itself. And again, it's pretty much still all ASCII and pretty much still GTLDs. Glue records, I have 3.2 million. Glue records of that 2.7 million were unique addresses. Almost all of V4, not too surprising. What I found surprising though at the bottom of this I have nameservers that only have V4 is still a huge number compared to dual stack. I thought dual stack would be more mature. The first number here that surprised me was that I thought dual stack was just a given and not at all, not in glue records. There are 5,000 brave souls in V6 only. I will say the first that they were very smart people in alternate universes. Glue addresses per name server, mildly interesting. There were 3.2 million name servers that had no glue. Now, I should point out after I did the slide, I realized it's actually not really a bad number because 450,000 of them were in CCTLDs, which means I don't have that glue. It's very important. That's why it's good to have these split. Most of the ones that have no glue in there were glue I shouldn't have had anyway. So that's excusable, very excusable. Others just because of name servers that are somewhere else. And then I'm discovering now there's a lot of bad glue, but that's beside the point. Most servers have one glue. This is not a long tail situation. It's kind of funny. It goes 2 million for 1, then down to 50,000, then to 3,000, then back to 69,000 for 4,000 and so on. And by the way, 0 to 13 is the entire range of values for this graph. There's nothing below that. This is all of it. Because we can only have 13 NS records because by law it's never mind. A lot of registers actually assume that when they wrote their original code. So it's stay that way. Now, in this study, some later on I'll talk about a bit more about addresses. It's kind of murky. I call it muddy data because the glue records, studying glue records itself as glue records has a whole lot of things I can do with it. I'm doing that right now at my next level. But this tool would actually then, if you gave it zero glue, it'd go out and get an authoritative data and then mix it all together. So some of my stuff about addresses later on a little muddy out there, but interesting to do. Eventually I would like to separate that and do what's authoritative versions of the data versus the glue and see how we've kept that up to date. So zones per name server. One point almost 2 million name servers out there of the 3.2 million only do one zone. Now that could be vanity naming. Could be all different names with the same process, but that's the way it lines up with the NS records. I go down there. You see this is a long tail and then maximum next slide was kind of interesting. There is a name server out there that has over 4 million zones on it. Okay, it's, you know, I thought, what I find interesting is that I mocked up, I mocked up the names on purpose of course, because I don't like naming names, but the two and the one and the first two lines is on purpose. Some operators NS2 has more zones than NS1 at the 4 million mark. I would think if I had that many zones out there I would have NS1 be bigger because all the other ones are. Or I would have it balanced or I would have it automated in some way, but it's your name servers? No, no, no. I sometimes see that a domain will either have NS1 and NS2 or some name server owned by the domain owner plus NS2, but for that the difference is too small. Right. Yeah, it's just interesting. You're saying that in some cases you see an owner and NS2 for the recording basically. Yeah. So, but the other ones, the other big ones, NS1 and NS2, and these are the top six name servers out there and it falls down quickly. There was a couple more at 2 million, but everything else fell right down. But I just thought it was humorous that the top one there had a few more zones on two than one and that's weird. Okay, so multi-tent. This actually came with a conversation this idea that how many zones are actually on a process? I mean you have a process, right? You have name server addresses, you have name server names and so on. And I've worked in places where I know how we mix and match these things. It'd be interesting to be able to trace that out. This study can kind of hint at that, but I haven't gotten there because in some cases we have vanity IP addresses, vanity name servers out there, that gives you a high one to one situation. There are ways to figure this out over time and maybe you're just going to find out to help look at what's the structure of the DNS hosting out there, how many major operators out there, so we can start balancing some assumptions about the CDNS, CDS records, who you're talking to out there. I think it's interesting to know who's doing the consolidation stuff and the whole dying attack a couple years ago. Anyway, so it's interesting to know who has what where. TLDs per name server, this is not as interesting but I put it up here anyway. Most name servers only have one TLD represented, meaning the zone that they serve is in one TLD. There were two name servers that had zones and 539 TLDs out there. Hosters can do what they want. So now, compressing the tests. Now, many people have done this, recognized the idea that the way this thing worked for every zone on every name server's IP address would run a set of digs. And it's probably not necessarily do that so we'll just pick one zone and just... Okay, so I did that. I picked out one. I thought there'd be 144 to one compression value in doing that, giving that 144 zones per name server if I did a straight out average. Which meant basically middle here expected tests. I expected to have 2.7 million tests plus some unknown number for the glue list because they wouldn't be discovered until I ran the tool. Now the and the rest of it, why I bothered to put this up here was that I had to ask me how many VMs I needed to run all this stuff and that's why it was important to me. So when I launched it, I came back, I had 3.5 million total tests out there. It breaks down a certain way. Now, when I expanded back by now saying for this name server all the zones I multiplied, that was almost a billion things tested. 999 million and so on. That was a 283 to 1 expansion. I had forgotten about addresses multiplying in there. I was able to actually do all the parsing and take all the data in 24 hours which I find important because it's fresh. Analysis took another day or two of the data but at least the data was all pretty fresh from the zone file. For the glue list I found basically both of them only had one and again the that 1.465 is very close to the number of name servers that were in CCTLD so that's not too surprising that I got one out there. One of the servers that had no glue in my files had 58 address records. That's a top end. There's one at 33 and a long tail applies here and it's interesting to see with those. Now, looking at just the addresses, I'm sorry, I have a question. How many were errors? Well, the first one 210,000 were either NX domain or no error, no data. So these represent name servers of zones that are obviously, well, these are name servers that will not answer. Probably the zone is probably broken. I haven't gotten too far into that but I'm sure there are some zones out there whose all name servers just don't work. No error. When you ask a question in DNS and it turns out the name you want has no information anywhere. That's called NX domain. But if the name exists, the name actually has no information about it but doesn't have address records. We call it no error, no data. Sorry, that's a protocol jargon in there. So I meant that the name existed but it wasn't there to give you an address. A C name, it could be MX record, TXT record, it could be a delegation. It could be anything but it didn't have an address record. I didn't dive into why. This is what I call muddy data. This is glue record and authoritative data. So let me just skip across here to V4. In V4 there were 1.1 million addresses, 692,000, about half of them were one name server had one IP address. The other ones there were some sharing going on. What I have here is some of the A1, B1 sends for a slash 16 which had 332,993 name servers named in that slash 16. It doesn't say a whole lot at top. But if you look into the slash 24, A1, B1, C1 there was almost all of them were there and if you go down further into slash 24, I'm sorry I think the slashes are off. All of them are pretty much in one slash 24 which means now in the 255 range all those names are all fitting into that and I'll go into the next slide because it's really funny to look at that. And also to give you a size of scale, the next 16 out there A2, B2 again I obviously gave the IP numbers had only 46,000 name servers out there and so on. The structure, this is less interesting you don't see the real numbers. Down on the top 6 there was one that showed more of an even distribution of there were 21,000 name servers in this slash 16 and there were about 5,000 each of the slash 24s below that something look more natural. For that one really large slash 24 I actually have these counts for each of the numbers that appear there. There are 16 addresses in that slash 20, that 255 is slash 24. They have between 21,000 and 20,500 name servers. So obviously this is someone has done this on purpose where they put all these names on here and it's just really funny. These two are two blocks of eight addresses. They do belong to one provider because I know what the names were in there. But the other thing that's funny about usually when I do things like that, usually I like to have a routable blocks these don't line up with cider blocks so I don't know what the deal is here. Someone has these 16 addresses and put in 330,000 name server names in that spot and they're running and they have to be inside of a hosting provider because they can't route that, you can't route a slash 29, which is what that would be, but it's not even a slash 29 because routing would have been like four routes or whatever. Sorry. Anyway I just found this really quirky. And then to give you the scale, these are the next ones BC and D were the next blocks that were that big. So again this all goes back to finding out how many people are sharing an IP address for name servers. Now v6, because v6 has so few addresses available I look to see who was sharing there and there is some sharing there and the in one slash 48, 48 bits out of 128 bits, there are 2,000 names. That's not a big deal. That seems, there's a lot of addresses there. But the next two, add two and add three have all of those addresses on one address. Slash 128 is the full address. So there were one name server names pointing to address three and address those two address three and those two blocks. So someone out there put that many names to one v6 address. Apparently they're running out of space. I don't know. And then to give you an idea that there are other culprits out there, if you look at single IPv6 addresses, 8 or 67 down to 4 or 16, there are that many service per that address, which I don't show because it's illegal to show address I guess. But some people are actually sharing address in v6. We must break this habit, folks. So, almost the end of my slides and I haven't gotten an ed in a zero yet. But it's interesting how much work, there's much work to get me done for this. So, well, okay, I'm going to just go into this and I have these results. These were the results that came out of the testing. Now, if you see in here, I have 997 millions total at the bottom here, a total test that would have been run. These are the virtual tests being done. 91% to the DNS was okay. Which I find really high, which is good considering all the other issues I found. For ed in a zero, 76% said they were okay because these are name servers, not zones, and concern are over zones and operators and all that. For the DO bit, which I expect to be high to 77% said okay there. And Shane, you look- Is that higher than the last one? Yeah. Of course, I mean a good reason to squint your face up. The DO bit requires ed in a zero. Right. So, let me repeat this. You're saying that the DO bit here, which has higher good than basic ed in this is kind of funny because the DO requires ed in a zero. That's just for the recording. Yeah. Yeah, yeah. So, yeah, that happened. EDS TCP, 58%. There were like 11 tests. I didn't put a mole in here. I picked the ones that seemed the most obvious to just talk about. This one seemed to be in trouble. Of course, if it really wasn't trouble, I have 10 home minutes to go. Let me take a story about. Oh, okay. So, I figured on one hand, it was February 3rd now, two days past flag day and if Rome was burning, we'd have known about it by now. But I haven't heard anything about that. In fact, I didn't see any talks here about EDS zero flag day at all. So, I assuming that it kind of went well. So, again, I didn't really spend as much time on this because ran out of time and also so much fun setting up the experiments. And the address not found failures and this actually goes to your question earlier. I have no address records and no address is found because I forgot to put the NX domain in here. No address records was the NX domains. These are the names. They're 75% or so of all the ones that had no address didn't exist in the DNS. But a quarter of that were names that had something other than an address record. So, that's how that melted down. So, lessons learned. One thing I say for this tool is that I tried to get a clear yes or no out of the tool and I was giving a long-winded response about how you had to judge all this stuff. I get the reason. It would be nicer if this tool had given us thumb up or thumb down for the server's address for EDS zero compliance, but that was kind of out. That wasn't going to happen. This, to me, the important thing was that it gave me a framework now for studying other parts of the registration system. I'm trying to get so that I want the data to be done in a day so it's fresh data. It may take longer to now analyze the stuff. But there are a lot of things in there that we can go in and look at. One thing that I've always been concerned about and the reason why I have so many numbers and counts here is that I'm always into have I actually captured all the results and I'm trying to make sure I don't leave anything on the table. Timeouts always bug me. I want to make sure that I know how to handle these things. So, if you're writing tools out there, be really good about making sure that every response comes back is recognized by the requester and that we can count to make sure that we have everything counted for. I found in here that there were some places that I realized I missed something and it opened up to like oh, there was a bug. I forgot about a certain set of servers already for a while. So, the goal of this is to improve the ability to manage the DNS system, managing meaning not to regulate or push it around, but to know where to put more attention for improving parts of it. What parts of this will make DNS cycle faster? What parts will make everything just click much easier? So, and then to wrap up, how do I get to 11 billion digs? I had 997,000 total DNS tests. That was from one of the, so each one of those represented 11 digs. There was, in the test, there were 11 things, I think it was 11 it went for. And the GLULIS had 726,000 more digs to get their authoritative address. You add that up and I used DC because that's all I've ever learned how to use for a calculator. And I get to about 11 billion would have been digs out of all of this. So, it only appeared in DC. It didn't appear in any, there was not 11 billion pieces of traffic out on the internet. There's this. So, that's what I have. Hopefully, it's been entertaining. Nothing else. Any questions? Any, I know you all didn't go home. It's Sunday afternoon. I've just one comment regarding many of these have seen this in January. Yeah, so comment was that a lot of improvements were done in January and to reach out to improve it, to say avoid world disaster. Yeah, that's good. There was a difference in the last week which was like zero or measurement error. So, the comment about CZ's experience, they brought down to zero error rate. Can you do that for the KSK by the way? We're 2%. That's good. Yeah, one of the comments, the comment here is that DNS firewalls are causing a lot of this. And I understand that's actually why this tool was a little hard to get a yes, no answer. I have a lot of experience with doing this kind of testing. To me, a timeout always meant I didn't know what was going on out there. And so I wouldn't qualify that as an error, but as an unknown. But in this case here, it was different because sometimes you could get through there and you got a timeout that helped you pinpoint that the error was, you know, there's something else going on. So, knowing that would change how we'd analyze all the results, but then get into the ability to quite understand all of the intricacies of that. I know Mark Andrews has spent a lot of time on this and helped contacting people that certainly he flooded me with too much information basically. So, I'm sure I'm familiar with that. So, any other questions? We hadn't for a couple of reasons. Attempting outreach to TLD operators. For one thing, I think Mark had advertised that he already had. And I think, by the time I ran this stuff, Mark was saying it was all pretty much taken care of. So, I didn't even really think about doing that. Secondly, we have a certain set of GTLDs we can talk to. The CCTLDs, there's an interesting dynamic there where some CCTLDs want to operate on their own merits. Some appreciate information. And so it's not as, we're not as forthcoming with some of the, we may, you know, probably should be. And as far as contacting anyone else involved here, we don't have a direct relation, like we can't just go out to the zone owners because that goes through the multi-layers of who's in charge of the, out there. Had this test been done a year before Flag Day and we found that there were things that need to be contacted, I'm sure we would have worked out how to do this. But given the timing, that this was, you know, the data, all this data came from January 18th, which is last minute to even get this thing launched and running. I didn't even try, I didn't even attempt the outreach part of this, which would have been, which is actually what the goal was. So, I relied on other efforts out there. Yeah. Half of the comment is that CZ has a tool for, partly because I'm too lazy to actually research, I didn't look at it that much, but I also assumed that it would have blown it over anyway, so I didn't feel that bad, I didn't bother. And because I was looking at the code that Mark had given me and I did use some of his reporting and decided just, you know, having an HTML page with red and green wasn't going to scale to 450 million, you know, endless record. That's why I, it wouldn't have been brilliant to have, to have used that. But also because I didn't want to split my data up by TLD, because I didn't want to have any kind of a horse race of TLDs, that's one thing that's important. I'm trying to look at this like broads of all of it out there. One thing that's important is, you have to have goals in mind for when you do this research, like what do I want to measure, what do I want to know, what do I want to present. In some cases you don't want to do horse races, but you want to do a qualitative protocol thing. There are different ways to use the same tool that it comes down to. . . . . . . .