 Tom here from Lawrence Systems, and I'm joined by John Todd again. It's been a little over a year since we talked to Quad 9. I dove into a video talking about DNS and lots of debate happened, and I was wrong about a few things, but John was really kind enough to help explain not just how Quad 9 works and how DNS works and how DNS filtering works, but a lot of other things. I'll leave a link to that previous video, but there's been a few things that have changed in the last little over a year since we did last video. How are you doing, John? Great. Thanks for having me again. That's exciting. A lot of things have changed in the last year. A lot of them about Quad 9, but actually I want to kick this off by talking a little bit about the thing that's sort of on everybody's mind, and I know you've already talked about someone, which is the Facebook adage of earlier this week on Monday. Oh, yeah. Because that's still the ripples of that are washing back and forth across the policy and operational oceans. So we're seeing a lot of interest in that. And I guess I wanted to just kind of talk about what our perspective was on that, just because I think that every part of the industry, whether you're DNS or whether you're routing or whether you're authorization, validation, advertisements, everybody saw kind of an effect of this, and it's been very interesting to watch it. So like everybody, we saw Facebook go down here ourselves, but then, of course, almost exactly coinciding with that we saw across our network, we've got like 180 plus locations worldwide. We saw a dramatic rise and increase in query volume. This was from Facebook, both humans, but also from apps doing very rapid requests to the Facebook's infrastructure, which wasn't answering. So they asked again, and they asked again, and they asked again. And so we saw in some locations, we saw more than double our normal total query volume. That was almost all that that entire volume was, of course, the request for Facebook. And it was all we responded back with serve fail because we couldn't talk with the Facebook back-end servers because of their BGP issue they were unavailable. So this was really a fascinating test of some people's infrastructure. Ours included. We did see a little bit of latency, higher latency at the higher ends of things, where the slowest possible query structure, the slowest possible bands were increased, but we think that that's mostly the Facebook responses failing out. And the serve fails, meaning that we are answering no back to people who are asking. We're actually able to answer that really, really rapidly. And so that was fine. We didn't see an appreciable service impact in almost any of our daylight locations, which are the people that were asking the most questions. So that was good. Some other providers, however, because of the way they were structured, were handling failures in a different way that was more CPU intensive. And so some providers actually saw a significant increase in lag on their responses or actually fell over entirely on the DNS. So this is how just one provider in one place caused a cascading failure that has repercussions downstream to other systems. I guess it's more of a proof of how well and tightly, or I guess not well is the right term, how badly and tightly DNS is integrated with everything. Yeah, I think the correct before I understand this, you get to the technical details. Facebook has a really short TTL under DNS, and their short TTL is part of their strategy to constantly refresh it so they can redirect you to what they refer to as the most healthiest route. So as they do healthy routes and keep them on short TTLs, there's a constant query coming, and it's fine if it's answering the same all the time, but then the technical problem becomes the short TTL means, is it there yet? Is it there yet? Yeah, correct. Combined with the other side of the technical aspect of you going and answering a query, but then querying back up to the Facebook servers asking them, and depending on how it's quit and figured, you have a certain timeout weight. So there's a timeout weight for the unresponsive Facebook servers, which aren't there anymore and because of their routing issues, they slow down on that side while it's doing the answer side every few minutes based on this really short TTL, like you said, creating a cascading error that brought down more than just Facebook. Well, that's the thing that we've seen a trend towards, which I think it's unfortunate, in that people are shortening up their TTLs, especially people who think that they have highly changeable results. So short TTLs have become kind of a thing and that kind of works in their disinterest if there's a failure. So a TTL of 5 or 10 seconds or 30 seconds or even a few minutes, it works great for that kind of load balancing, load sharing environment, but it actually works against you if you've got even a transient problem on your back end because the way the graph looks is that the TTL holds an answer in memory for a certain period of time. And so a certain percentage of DNS servers are going to have that in memory for that period of time and they're going to answer back to their clients with the correct answer until the TTL expires. Now, as soon as the TTL expires, you go and ask the back end. Now, if the back end isn't responsive, it times out and there's a, you know, depends on how long you want to wait for a timeout, but it's a couple, you know, 15 seconds, in some cases, 5 seconds for a timeout and the client then receives a, you know, serve fail, can't talk to the authoritative server. So now the question is how long does the client remember the serve fail and do they immediately request again to the same recursive resolver or a different recursive resolver? In other words, how do they handle that? Most will answer or ask the question of a different recursive resolver. And then if they get two serve fails or three serve fails or however many serve fails are handed back, in other words, if all the recursive resolvers say, I don't know, how long do they keep that? And when do they ask again? And so those kind of like, those are interesting questions that contribute to these kind of failure modes. Like if they don't wait at all, if they're like, oh, I got three failures, I'm going to start the cycle again, well, then, yeah, you're going to get a huge influx of new queries, which is, I suspect, some of what we saw in this Facebook failure on Monday. So, and then the other question is how long does the recursive resolver operator, like in our case, how long does Quad9 hand back the serve fail result before we try again to communicate with Facebook servers? And that's that is also something that is potentially and is configurable on the recursive resolver stack. So, you know, there are a lot of buzz in the last couple of days is all right, well, what do we do as recursive operators? Do we change the length of serve fails that way? Like how long do we retain a serve fail? Do we retain it for longer to reduce the overload on Facebook or whoever the failed service provider? Do we increase so that we don't ask? That, of course, increases the interval of failure, because if we say, all right, we're going to remember the serve fail. We're going to remember it for three minutes, five minutes, whatever that time is. Then even if the service comes back in that window, we're not going to ask the authoritative server. So there's a trade-off between not flooding the authoritative server and getting the service back and operational again. So this is renewing some discussions about how that works. There's also the concept of what's called serve stale, which is rearing its head again. And serve stale means that a recursive resolver will get an answer from an authoritative and keep it in its cache for a TTL. And theoretically, at the end of the TTL, it discards it. But actually, in most practical terms, it won't ask at the end of the TTL. It will ask slightly before the end of the TTL if it's an active name. In other words, if there's a lot of activity on that name, why wait and not have an answer? When you know the TTL is about to expire, you're actually going to go out and ask the authoritative server in advance of the TTL expiring. Anyway, if you don't get an answer, let's say you get a serve fail. Now a serve fail means that I wasn't able to communicate with the authoritative server. Then maybe if your local policy on the recursive resolver indicates this, you can answer with the last, previous, good answer you got. And that's called serve stale. And there are some recursive resolvers that support this and some that don't. And this was sort of an ongoing discussion of, all right, well, what does serve stale imply? Because now you're lying, right? You're actually giving an answer back that isn't within the TTL that the authoritative server provided you. But in most cases, the authoritative provider would say, hey, if I'm serve fail, yes, please answer with the last thing you got from me because that's better than no answer at all. But now this becomes a subjective question of what domains should be serve stale and what recursive resolvers will implement serve stale? How long will you answer a serve stale for? So it's becoming an interesting policy question that isn't, that there are some discussions about standardization, but it's still very much a subjective question. And it's crazy to me to think that in my, you know, 20 years ago, I was a mail server administrator. And these are problems we ran into that were exasperated because everyone cashed a lot of things because 20 years ago internet was a lot slower. So it was frequently cashed and not respected to get the DNS failure. So we had moved a mail server for a client. It would take way longer than it should have to try to get the new records updated. They're like, we're just going to hold these for 40 hours. We're starting to flip that right where DNS operators are dropping their TTLs to the point where it's actually hurting them. We are of the opinion that longer TTLs are better because it makes the volume of the queries reduce, et cetera, et cetera. We're also not big believers in using the DNS necessarily to direct content, although I totally understand why that's the case. Some providers have to do it that way. They're better, I believe, I'll take that outside of the organization, but I believe there are better ways to direct content that are at a deeper layer inside the web stack so that you can redirect people to the more appropriate destinations without using the DNS because the DNS is not a very good signaling model for figuring out where users are. It might be a better model for indicating what servers are available in general, but I don't think it's a very good indicator for where users are, which is very often the whole reason that the TTLs are lowered and the concept of freshness is an issue because, all right, well, where geographically are you or politically are you that directs that DNS query. That's a whole nother conversation. It's really interesting, though. One of the side notes that I thought was good in the Facebook write-up is they were slowly turning everything back on, doing a cold start of a data center, not easy, not always in people's playbook very well, because failover is one thing, but actually this whole restart thing they were doing with new routes is how many megawatts would it take? And kind of like you said, how long do you hold on to a serve stale because they're trying to balance how many megawatts can we pressure because when the things come on, things are going to flood. We have a backlog of connections and updates that are going to want to come in and will this overload certain areas? And that's just... Well, they actually did some of that and I'm not sure if that was intentionally or unintentionally. When we saw Facebook come back, when we saw their DNS servers become reachable again, there were some brief moments of connectivity where we saw queries coming back. I was actually watching packet dumps and some of our larger recursive resolver clusters looking specifically for responses from any of the Facebook authoritative. And we would see a few come back and then there'd be two minutes or so of no responses and then we'd see a few responses come back and then a couple of minutes of no responses and then it started increasing where we'd get valid responses. But then we are also seeing refused, meaning we were sending queries out and they were sending us refused responses in return, which as I said might have been intentional, it might have been unintentional based on their load. So there was some natural curve there in bringing things back, or at least allowing people to even know where Facebook's servers were to connect to them. It's complicated because we've made things so big, it's almost like it doesn't feel like it should have scaled to this big in some circumstances. It's kind of a lot to think about if you were to ask me 20 years ago, we're going to build a system that we have 2 billion users who are going to log in every day. I'm like, you're going to do what? Yeah, exactly. And honestly, Facebook's namespace is not the largest on the Internet by any stretch, right? There are much larger namespaces that if they went down and if there was this kind of pathology of rapid re-requests would be much worse. So, yeah, I think that I'm not okay with this failure happening, but I'm certainly interested in the lessons that's going to teach everybody on the authoritative DNS side and also on the recursive side, like what are we going to take as lessons on how to make this work better in the future? And things like surf stale or increased TTLs or extended DNS errors is another one. Actually, that's a new one since we've last spoken that has come out, that might be able to allow clients to understand a little bit more about why certain failures are happening so that they don't re-request over and over and over and over when there's no information that's going to come back. It's always fascinating to me, you know, set aside the name Facebook or whatever problems they may be experiencing, the technical aspect of it becomes really interesting because we're still talking about building these large-scale networks and how do we deal with things at this, you know, this many million requests per second, this many million DNS entries. That's always where the technical hack comes out and we go, how do we really solve this problem? Who cares who it's for? How do we really dig into it and what policies need to be around this? It's just a fascinating idea to me. Well, there's certainly a lot of effort underway to try to make things scale better. I'm also going to say that there are a lot of changes underway in the DNS in particular that will challenge that. So things like encryption as an example, which is great for privacy but is not so great for scale, right? Because now you're layering an additional encryption tunnel, essentially, on top of the DNS. So the scaling that is going to be more challenging and things like oblivious dough, which you may have heard comments about or which adds like even two more hops into every DNS transaction for those people who implement it. So that's going to be fascinating to see how the reliability and performance is versus privacy. And so these are some of the trade-offs, right? The trade-offs of scale versus cost versus privacy and how you balance those. I'm really pleased to be sitting at kind of the crux of some of those questions with Quad9 because of course we have all of those things. We have the privacy scalability and as a nonprofit, certainly the cost comes into play as well. So it's a very interesting place to be right now with the DNS and then with the downstream scaling things as well of what the DNS serves. So besides Facebook and the new logo that I have behind me, what are some of the other new things that we can talk about over at Quad9? Sure. Well, one of the new things in that you said the logo behind you, in February of this year, Quad9 actually changed where we are headquartered. We changed our foundation status. So we were a 501c3 based in the United States and that organization still exists, but we have transferred the, the Quad9 is now a Swiss entity based in Zurich. And the reason we did that is really to make life more difficult for us. And that doesn't sound, that doesn't sound reasonable, but as a US entity, there are no laws in the United States that as a nation govern privacy. So as a US based organization, people in Europe and actually the rest of the world would ask us, well, what is the, you say you're doing certain things with privacy, but there's no, who holds you to those guidelines? And our answer was we don't have any. Yeah, I mean, you're just going to take us at our word because there's nothing really that prevents us from doing whatever we want with your data. You're just going to have to think that we're nonprofit and you, you know, you believe us. So that, that, that doesn't hold, right? That doesn't hold any water over time. Switzerland has one of the strictest data privacy policies in the world. We have to comply with Swiss data protection laws, which are essentially a super set of the GDPR. So, and there are laws in fact in Switzerland that make us liable, not simply at civil, but also in criminal ways if we, if we violate those privacy laws. So Switzerland really holds quad nine now to a much higher standard of privacy, meaning that what we say we do, we have to do, and there are laws that regulate that. So there's no, there's no weasel room around that. Additionally, Switzerland has really good laws as far as transparency and how information requests would be transmitted to quad nine. So there are no secret laws in Switzerland. There's no secret requests. So if someone asked us for information, we're allowed to say that someone asked us for information. We still have to comply with the laws of Switzerland. Don't get me wrong, but there are no, there's no hidden requests that can be applied to us in Switzerland. So that's kind of a secondary, that's a very far trailing issue because quad nine doesn't have any data to give whether we're in the United States or whether in Switzerland. We don't collect any data. So it's not that big an issue, but it's really, it's got, it's telling end users, you can trust us because you don't just simply have to trust who we are. We are held by law and anybody in the world can actually not just Swiss and citizens, but anybody in the world can actually consume us if we violate or if they believe we violate their privacy. So that's another interesting component of Swiss law and it go into the website. There's actually a huge, we have a huge session in our privacy policy which describes all of the findings of law that we did with Switzerland. It was a very long involved process, excruciatingly long with legal, but we thought it was worthwhile because we are now really the only major DNS provider that can say we're held to a privacy standard by law, not just by our say so. This is interesting because this is a concept that I didn't know until I started digging into other cases. You're like, wow, European law works with a few different concepts that just don't apply in US law. And like you said, no secret laws and things like that. That can be, it's really good from a privacy standpoint. It's kind of, it almost seems unusual. And of course my European friends think it's unusual some of the laws we have around this. But that's really cool though, because I think it gives people that extra confidence they're looking for of, everyone waves the flag of we do privacy, we do nothing with your data. And then when something happens, you're like, oh, by the way, we had all your data. And there's nothing you can do about it because. You can do about it. You can just say, I don't like those people because they lied to me and that's as far as it goes. But actually moving yourself there gives you that, like you said, there's a real, not just civil, but perhaps criminal consequence that can come with that particular aspect. So I think it's, that's a really interesting move. It's a double-edged sword. And that's the other news I wanted to talk about a little bit. So the double-edged sword is that of course we are, that that European law does apply to us. And a specific component of European law is the Longara Convention, which allows European nations plus a couple of others who are not necessarily in the EU like Switzerland to raise civil suits in the courts. And there's the other piece of news you'll find on our website. We've actually been, we were presented with a lawsuit in June of this year by Sony Entertainment. They, their assertion in German courts is that by resolving a certain domain name, which happens to be a server I believe in the Ukraine, by resolving that domain name, we are indirectly abetting copyright infringement because that domain name has links, which lead to another site that has, allegedly has downloadable songs that were in Sony's portfolio. So they, I asked the German court to apply an injunction on us that prevents us or that holds us accountable for resolving those domain names. In other words, they said, you must not resolve this particular domain. And they then could apply that to us in Switzerland because of this convention. So that was in June. As soon as we learned about it, as soon as the court presented this to us, we actually did, and we currently are filtering that single domain name for just German citizens in just Germany to prevent them from reaching that site. And we of course are filing and have filed an objection to the ruling, which we are still, and that's still in the court process. So it's a, I'm not going to talk much about the, how we believe that we don't, that this doesn't apply to us because we actually published our, again in the spirit of transparency, we published both the injunction as well as the entire text of our objection on our website. You can go read that in both English and German. But really, I'd like to kind of talk about what happens if we lose or what happens if we just entirely, if we give up and don't do anything, and if the industry gives up and doesn't do anything. The assertion is that a private organization such as Sony would be able to present a private or a public organization, such as a recursive resolution operator like Quad9, with a domain and say, we believe that this site is violating some of our rights. In this case, it happens to be copyright. And you must stop resolving it because you are indirectly abetting that site. We believe that that would be an extraordinarily dangerous precedent. Even though this court case might not address the future precedent, it certainly is a significant portion of what the outcome will be. But having one company or one organization or one individual even be able to assert and present a, what is effectively censorship against another. It's not censorship if they believe their rights are being violated, but still they are effectively stopping access of a remote site which might not even be within their jurisdiction. I think that's a really dangerous thing and for us as an organization, as an example, it becomes untenable. We can't field hundreds or thousands of requests from companies or individuals saying we need you to stop resolving these domains without necessarily having any proof of that or without having any due process. So that's a real challenge, not just for us, but we believe that that's also going to fall through to anybody else who does anything involving a domain. So like safe browsing, firewall operators, anybody who does antivirus work, they could similarly be presented in our belief with a list of domains by rights holders or anyone else to say you must stop resolving this. And that is why we are so concerned about this particular ruling. It's interesting that this ruling occurred within... this process started within about a month or so of us appearing in Switzerland and it's interesting that we are, as a non-profit, the one who seems to be singled out of all the possible organizations doing DNS recursive resolution in Europe, they picked one in a different country who happens to be a non-profit. So anyway, it's a fascinating case. What we're hoping for and we have received this is actually makes me feel really good about, again, working with Quad9 is that we received a huge influx of individual sponsorships. Small people are contributing 5, 10, $20 or 20 euros or 20 francs to Quad9 in the hopes that that can work in our favor in the lawsuit and we're really appreciative of all our individual sponsors. But what we're really looking for is also industry support. If any of you watching this are in the industry doing antivirus blocking, filtering of any sort, you should be very concerned about this because if it's let to stand, if Quad9 gives up as an example, this we believe will almost certainly be used as a lever for other types of filtering systems and if you do business in Germany or in fact do business in the EU at all, this might have negative repercussions that are pretty significant are going. This is really concerned when private companies kind of not the first time these companies have tried to do this kind of dictate the internet like you can't have this. We've seen this to some of the takedowns and domain takeovers and things like that where they're able to domain seizures and for things that aren't really it's a debate that's not as simple as they're doing something obviously criminal so to speak. It's really it's a real call. DNS is pretty far removed, very far removed. Recursive resolvers especially are very far removed from the actual infringing content and German law in particular has a clause called English translation is duty of care that extends in some instances to that but there are specific call outs for people who are exempted telecommunications providers I believe is one of them but we think that DNS recursive resolution and other services potentially in the future need to be added to that because this doesn't do anybody any good like this is a this could end up with a very big net negative not just for Germany but for the EU as a whole if it becomes untenable to do business in some of these regions because of these burdens by the additional work whether those are financial or administrative burdens it's all the same so we're very much opposed to it. Exactly because it is not something that makes a ton of sense to me but yeah so those are the I mean that's kind of some of the big news we've had I mean we've got you know the usual we've coordinated expanded into a whole bunch more cities I think we have something like 20 or 30 cities in the backlog queue right now that we're slowly turning up which is great we have more partners who have given us hosting and server resources so that's great we're you know South America is going to be on real soon now in a big way we actually have facilities there but now we're basically doubling or even tripling our infrastructure there. More points of presence means less latency and that's what gets people excited about it. And again kind of with that the political layer on this as well but the more we can keep the data in country you know from clients who are asking for things and having that resolved by servers that are very close by in their own nation that's great for everybody's security whether it's encrypted or unencrypted doesn't matter but keeping the answers as local as we can get them to the end users really really important especially in underserved areas like as an example is Africa where you know so much traffic is moved across international borders the closer we can get things to end users the better it is for security and privacy. Yeah that's it's a win-win all the way around. Yep. What else is happening there is actually a lot of stuff happening in the DNS world a lot of it's kind of boring or I mean I think it's exciting because I'm a DNS nerd but there are new standards that are coming out which I think are really fascinating like Oblivious Doe which I talked about there are some people are experimenting with DNS over quick so that's interesting we're not there yet we're waiting a little bit more for the standards to settle but extended DNS errors is one that we actually we have a we're very interested in and we'll start looking at shortly that means that right now whenever you do a request to quad 9 and we block it because it's on our block list of you know however many millions of domains we give you back an NX domain which is great it protects you against the threat but it's kind of an error and you look at it go well okay couldn't resolve the domain that you know can't that's kind of a lie and of course we are lying we are answering that we can't resolve it but it doesn't give you any additional information so extended DNS errors is a new draft that's that's come out in the last year that essentially allows us to tag that NX domain with more information you know some of it's very simple it's just a numeric like DNS failed right that's that happens sometimes or blocked like we can block a domain and so you get some additional context or your client at some point in the future we'll get some additional context and be able to surface that up to the end user so they can instead of seeing just a host not found they'd see your DNS service provider blocked this domain due to security reasons like that would be really really useful and so we're hoping to be able to include some of that in the future here with this extended DNS error set there's also a lot of other stuff that now we can put in that EDE response that that is much more informative so that's there's going to be some interesting stuff happening with that in the next couple of years I'm hoping some of the device manufacturers or operating system manufacturers are are taking that to heart and they'll actually surface that to the user and that would be great because then we don't just say it's DNS it's DNS and here's the answer we have right and then here's why it's the DNS why it's the DNS DNS to hold this why it's broke yeah because right because right now really we only have functionally there's only two different errors that you get back in DNS it's an X domain which is host not found meaning that I know that this domain exists but the thing you're asking for in particular doesn't exist and we know that that's the case or serve fail like I couldn't talk to any of the resources that I need to ask the questions those are really weak signals I mean they're terrible and and we need to do something better so I'm really hopeful that EDE is something that is everyone starts to adopt and be able to display up to the user in the next couple of years well that's awesome was there anything else we have to cover today sure but no I'm I'm good we as always there's there's a there's a million different conferences and discussions and stuff I'd love to talk about but I think that those those are some of the top-level things for quad 9 I'm really thankful for your time today I will link down below to obviously quad 9 if you haven't heard of them I'll link to the previous video we did we talked pretty long about a lot of details about DNS if you're here's about the entire history of quad 9 which is pretty awesome and and if you didn't know as we mentioned earlier there's no logs or anything you're extremely privacy oriented and now you've taken it a step further by putting the shackles on yourself and going no no we are now we don't just say it we're in trouble if we don't put our money where our mouth is as they say in English you know I love it so well thanks John is awesome and thanks for joining take care of your time see you next time and thank you for making it to the end of this video if you enjoyed this content please give it a thumbs up if you like to see more content from this channel hit the subscribe button and the bell icon to hire a sure project head over to laurancesystems.com and click on the hires button right at the top to help this channel out in other ways there's a join button here for YouTube and a Patreon page where your support is greatly appreciated for deals discounts and offers check out our affiliate links in the descriptions of all of our videos including a link to our shirt store where we have a wide variety of shirts and new designs come out well randomly so check back frequently and finally our forums forums.laurancesystems.com is where you can have a more in-depth discussion about this video and other tech topics covered on this channel thank you again and we look forward to hearing from you in the meantime check out some of our other videos