 Welcome everyone to this talk by Robert Edmonds about DNS in Debian. Hello. Can you hear me? All right, so just get started. So what exactly is the DNS? These four definitions are from a pending RFC draft, and they're really quite opaque definitions. They're very jargon filled, and they're definitions by DNS protocol geeks for other DNS protocol geeks. And so to an application developer, someone who just wants to use the DNS incidentally, these are very lofty definitions. I'm not going to read them all out. So here's my second slide. The more nuts and bolts definition. The DNS dates back to the early 80s, this early experimental era of the internet, and it sort of co-evolved with the other early protocols from that era, like SMTP. And the big reason for it was to move this static configuration data into, they didn't call it the cloud then, but they were taking the host.txt file. There was a lot of really weird stuff around mail in this era, and this eventually evolves forward. And it's now used by every operating system that connects to the internet, and obviously including Debian. Most DNS talks, they sort of focus on this infrastructure side. This jargon filled side of the DNS where you have terms like recursion, delegation zones, servers. But this talk is going to focus more on the application side. We're also going to talk about the infrastructure, but I wanted to make this talk more general and talk about programs that actually want to make use of the data in the DNS rather than the plumbing of the DNS, specifically how does the DNS satisfy these application needs. Here's what the DNS architecture looks like if we oversimplify it and put it in these boxes and diagrams. On the top row, we have the jargon, and towards the left side of the diagram, we have more application-focused parts of the architecture toward the right-hand side. You have more infrastructure-focused stuff. So on a Debian machine, every Debian machine has a stub resolver, and these applications that talk to that stub resolver want to do DNS lookups. Those stub resolvers then have to send a DNS query to an upstream name server, a recursive server, the name server option in your Resolve icon file. These recursive servers actually have the full algorithm and data and cache to be able to talk to the actual authoritative name servers on the internet where the zone data is stored. These green arrows are DNS queries going over the network. These red arrows are the responses coming back. The end goal for the system as a whole is to get this zone data over on the right-hand side back to the application. It's going to want to look up a host name so it can make a TCP connection or send email, that kind of stuff. The DNS data model is pretty simple. It's a super simple RPC, almost. It's got one method which is to look up a key, get a set of values back. It's actually sort of a twofold key and a type. We have various restrictions on these parts of the data model. Keys have to be less than 255 bytes and there's various syntax restrictions. Type is a 16-bit integer and there are a number of well-defined types that have mnemonics for them. Type 1 is the A, type 28 is the quad A. These familiar well-known mnemonics. Values technically have a limit of up to 64k of data and the actual layout on the wire for these values is usually rigidly defined by the specification documents. In practice, there's actually very few actual well-known types. I believe there's fewer than 100 out of 64k. Most creative uses of the DNS, they tend to reuse an existing type rather than go off and arbitrarily define their own types. I believe part of this is due to the existing APIs and limitations that we've built up over the years. Here we have some trivial examples of taking a key and a type and getting back a set of values. A, quad A, MX records. Basically this is just the documentation examples. Getting into application use of the DNS in Debian. I've looked at, not exhaustively, of course, but there's a lot of software in Debian that uses the DNS. I mostly order them in reverse popularity, I guess. The most common way that most applications use to look up data in the DNS is through this function called get out or info. Usually they don't actually care that the DNS is actually returning the data it's looking for, which is a good thing, actually, for things like address family independence. We want applications to work transparently on IPv4 or IPv6. The application shouldn't be specifically selecting a particular A or quad A Q type. This is probably the wrong interface that you would want to use if your goal were to interact with the DNS data model, per se. Most applications actually doing hosting lookups and this is the right interface for them. There's another big drawback to this. This interface is that it's a blocking interface, which is okay for the command line, ping W get type applications. It's very bad for things like web browsers that need to be highly concurrent, highly responsive. There's no standard asynchronous get out or info version. The plumbing for this interface is kind of interesting. If you've never heard of it before, the name service switch is this sort of interesting abstraction layer deep in the C library, Glib C. You can run the get out or info function itself with this command line driver called getent. Usually at cnsswitch.conf you'll have a host's line and it will have one or more parameters and almost everyone has at least DNS. This is the default that we've always used for years and years that actually goes out and does DNS lookups and puts them into, you know, returns them through the get out or info interface. There is an alternative NSS module that also does DNS lookups. It's not actually in Debian yet, which is one called resolve, which may be in the future provided by system D. And this DNS, NSS module is essentially an adapter between this front end get out or info porcelain and the back end, the actual DNS plumbing, which is provided by a library called libresolve, which is shipped in Glib C. And then we have resolve, which is the system D rewrite, the future replacement or future option and it also does DNS, but it also implements a couple of other DNS-like protocols like MDNS and something called LLMNR. And this actually offers a debus type implementation. The second option for an application is to directly call into libresolve, which is this Glib C library, which was actually imported from very old versions of bind in the early, starting in the early 90s and it looks like the last actual merge was in 15 years ago from bind 8. Other OSes have also imported this code, BSD variants, obviously, Slaris, I believe. This is quite an old bit of code, but the interface is really low level and some of them are documented, some of them are not. Probably the best reference is actually in an actual book called DNS and Bind from O'Reilly. These interfaces are pretty low level and require a fair bit of mastery of the DNS protocol. There's actually a lot fewer uses of libresolve than get-out-or-info in the archive. There are a couple of good examples. One is PostFix, which uses both get-out-or-info and libresolve, uses get-out-or-info, which is the proper interface to use for making socket connections to SMTP peers. It also uses libresolve for a variety of other uses. A modern mail transfer agent typically is going to do a lot more than just A and even just A and MX. It also interacts at a level with the DNS that requires this lower level libresolve interface. There's a program called Rapserve, which I wrote, which uses libresolve exclusively. My experience with this program is that people probably shouldn't have to use libresolve if they want to interact with the DNS. The third option is third party libraries. If you have a use case that can't be served by the general purpose get-out-or-info interface and you're not a big fan of libresolve's very low-level, very low-level, very crafty interface, you might want to consider a more modern third party library, especially if your use case requires a lot of asynchrony. If we look at the available third party libraries that do DNS look-ups, a lot of them are. We have ADNS, CREs, libEvent, and libUV, which are very focused on the asynchronous use case, almost to the exclusion of other opportunities that are offered by the DNS data model. The big 800-pound gorilla of third party DNS libraries is something called LDNS. It's fairly comprehensive. If you look at the symbol table of the library, it has over 800 public symbols. These are just functions. This is a ridiculously large library. If you're building DNS servers, this is probably the library you want to use. It's probably actually overkill for a lot of more simpler uses. This is a C library. It does have Python bindings, but they're Swig-based and I've never been able to figure them out. There's another library called Live Unbound, which is also from the same vendor as libLdns. It's sort of like putting an unbalanced server inside your application and running it. Unbound is a fairly lightweight DNS server, but embedding the whole thing into your application is a fairly heavy-weight thing to do. If you absolutely need DNS-like validation inside your process, if you need asynchronous caching, this is probably the third-party library that you want to use. It's also got Swig-based Python binding. Get DNS is another option. This is probably the most recent and most comprehensive attempt at this application use case. It's intentionally focusing on this porcelain, the DNS porcelain for application programmers. It's got a nice modern API. It's got async support. They've taken a fairly formal approach and split it into an API specification and reference implementation of that specification. Ideally, in the future, there will be additional conforming implementations of that spec with API compatibility. There is a Python binding, and I found it was not Swig-based, so I suspect I'll be able to learn it. It's very new. It's not in Debian's table. That's pretty modern, I guess. That would still be called stable backports. Yeah, of course. I looked at that one previously. The fourth option is the comedy option. Just do everything yourself, and none of the available options suit you. Just do everything yourself and integrate it directly into your source tree. Chromium browser does that. It's a web browser, and it makes certain demands of the network. The vendor has decided to require a very great controller over the DNS resolution process. Here's our summary of these four options. Get Outer Info. It's got low flexibility, but it's very portable, available everywhere. Then you go up to Libresolve. You go to Additional Complexity, but it's more low-level. A lot of OSes have it. You probably have to adjust your build system to be able to access it. It goes by different names on different platforms. The third-party library is going to give you the most flexibility, whichever one you like best. It's obviously going to introduce an extra dependency in your application. Do it yourself, Johnny-Face. Let's switch gears a bit and talk about security. Switch over a bit to the infrastructure side of the DNS. We have this classic information security triad of confidentiality, integrity, availability. If you've not heard of this concept before, you can take out the Wikipedia article on it, I guess. There's these classic properties of security. Within the DNS security, the approach to security has traditionally been on integrity. You want to prevent malicious servers from being able to just poison your cache with anything they want. A lot of work went into that problem in the 90s, and there's a formal ranking of trustworthiness based on where the data came from, whether it outranks another source of data or not. Those were non-cryptographic rules, and people were realizing it would be good to be able to cryptographically authenticate the data as well. That would be the ultimate tier of trustworthiness. This work got started early and mid-90s or so, and then a lot of years passed, and they completed the core specifications around 2005, and they actually kept making more specifications after that, which eventually culminated after a number of years of policy and governance stuff, and the actual signing of the root DNS zone with a DNSSEC key. That's trickled down to secure delegations to a lot of GTLDs and CCTLDs that now support DNSSEC, and the new ICANN new GTLD program mandates that all of these new GTLDs must offer DNSSEC support. So the number of TLDs that support DNSSEC is now kind of lopsided due to that explosion. Some protocols, like HTTP, don't currently make a significant use of DNSSEC. The vendors of browsers have decided to focus on more TLS-oriented security measures, things like HTTP, strict transport security, and a variety of other ways to communicate the fact to be able to securely distribute not just authentic data, but the ability to get cryptographically private communications between endpoints, and the problem with DNSSEC here is that it only offers authentic signing of these DNS records that are used to make HTTP connections. However, I feel like in other protocols, like SMTP, SMTP actually has a much better use case for implementing DNSSEC support, and the reason is this level of indirection in the MX record. So the MX record determines where your mail goes. It lists the mail exchange, the destinations for SMTP message. And the MX records actually don't tell you the addresses of the SMTP server. There's a list of names of other servers, and these resource records aren't actually authentic. So if you actually validate the TLS certificate of the SMTP server that you connect to, the way that you would do that is usually based on the common name field in that certificate, and the value for that is entirely controlled by the MX record, which is unauthenticated. So this is a totally worthless thing to do if you're getting, you know, potentially poisonous data back from the DNS. So I think PostFix is a really gung-ho on DNSSEC. If you look at the latest versions, they've got a significant amount of new functionality. If you look at the TLS documentation for PostFix. Look, specifically at DNSSEC and Debian, we're using this old G-Lib C stub resolver from the 90s. It's been maintained since then, but it's not seeing an overhaul that would allow it to easily support DNSSEC. And the way most Wiki-type, tutorial-type guides get around this fact that there's no validating stub resolvers. They're just recommending installing a local DNS server. So if your client doesn't support security, just install a local server on your machine and just point the client directly to that local host server. And that's the direction that Feudora seems to be moving in. They've got a proposal there to do that by default. The big problem with this is that if you're on a network that causes DNSSEC to fail, you don't want to not be able to browse anything. So there's a utility called DNSSEC Trigger. Feudora has got a significant amount of this DNSSEC Trigger machinery and integration work that they've been doing to try to get this DNSSEC by default goal in their platform. So an open question here is, should we have something like that, perhaps as an installation option in Debian? I'm the maintainer of the Unbound package. In Unbound, if you install it, it will by default give you a DNSSEC server. I've been using it for a number of years and it's my currently preferred validating DNSSEC server. It's possible that this could be an option. I don't think I would recommend it as something to enable by default anytime soon. I think we should wait and see how this works out in Feudora and move appropriately, I guess. In Debian, we also have a package called DNSRootData. This package is basically just a data package. It's sort of very roughly like the CA Certificates package. It's got copies of these parameters called the root zone hints and the root trust anchor that control where the... Sorry? Okay. So there's these parameters in this package. Some of this stuff gets hard-coded into various DNS servers and software. If we add even more DNSSEC libraries, we might want to have this centralized to reduce the pain of these rollovers. Okay, I guess that's the end of the time slot.