 Our next speaker is Lynn Root. She doesn't need any introduction. She's a member of the Python Software Foundation Board and founder of PyLadies, and she's going to be speaking to us about using and creating DNS's with Python. Thank you very much. So yes, my name is Lynn Root. A little bit about me. I am a back-end engineer at Spotify, making some awesome APIs for external folks to consume. I'm also a leader slash founder of PyLadies of San Francisco and board member of the Python Software Foundation. Since I have a stage, I just wanted to promote PyLadies a little bit. If you'd like to start a local PyLadies or know someone who's interested, you can literally pip install PyLadies, and it will contain all what's needed to start your own location. All right, so back to this talk. As a nerdy person building many side projects, I have many frustrating experiences when setting up deployment. As I'm sure many of you have experienced, the very first git push to Heroku never works. So I'm assuming you wouldn't be here if you didn't experience that pain. You follow the directions on your host's website to properly set up DNS records, but something that still doesn't work. We've all been there without a solid understanding of DNS. Oftentimes we just try something, see if it works, deploy and see if it propagates, and hopefully it works. But that's not really a solid way to debug. So curiosity got the best of me. I've previously done pretty deep dives into technologies that I only had a cursory knowledge of, like Kerberos. So naturally I decided to dig into DNS, pun intended. I know that DNS was Internet's phone book. Sure it's the backbone of the Internet, and I knew the cloud was built on DNS and duct tape, and DNS is actually built on duct tape too. But that's pretty much all I knew. So in this talk I'll be going over what exactly DNS is and why it's important, what you can do with DNS, and a handful of awesome things I learned along the way. So in case you miss anything, this is the one link that you need to know and has links to everything I talk about, and I'll show this at the end of the talk again. Note that I'm probably going to fill out the full 25 minutes and no questions, so maybe buy me a beer after this talk to ask me questions. Anyways, diving in, what exactly is DNS? Why is DNS important? It allows you to visit productive websites like Reddit, receive critical emails like Groupon or Guilt, deploy your one-of-a-kind-to-do app, or allow for your corporate meme generator to not be accessible by non-employees. I don't know why you wouldn't want to hide your secrets or anything. Anyways, getting up to speed on what exactly DNS is, it stands for our domain name system and it's widely referred to being a phone book, translating human-readable names into computer-friendly addresses. The textbook definition is DNS is a distributed storage system for resource records. HDNS resolver or authoritative server stores these records in its cache or local zone file, and a record contains a label, a class, a type, and some data. So essentially, a resource record is the data structure specific to DNS. All right, so textbook definition, kind of boring. Let's see it in action a little bit. Of course, I wanted to use a little bit of Python. My latest crush has been Skapey. For those that you don't know, Skapey is a very rich Python wrapper around TCP dump, kind of like T-shark or Wireshark. And so here I'm using Skapey to sniff DNS traffic as I'm browsing the interwebs. I use Skapey's sniff function to pick up my local traffic and I've filtered on UDP protocol and port to 53 to hone in specifically on DNS traffic. And I only wanted 10 packets or datagrams. So as I let this sniff function run, I went to my browser and typed in robelin.com. And what I found was pretty cool, as you can see, as I was typing robelin.com into the Chrome's address bar, DNS query would take place for every autocomplete guest that Chrome took. So at first, pings www.google.com because the address bar is also a search function. And then as I type R, it autocompletes to Reddit because it's one of my most visited websites. And you can see the DNS query on the second line and the answer is on the following lines. Then as I type RO, Chrome guest is robelin-spy.parochuab.com, which, by the way, is my awesome how to spy with Python talk that if you're here for PyData, you should come see on Saturday. And we can see the related queries that it points to. And then finally, as I type ROG, it finds robelin.com, press Enter with Chrome's autocompletion. And so I guess this is more of a cool thing that Chrome does to speed up possible search queries that uses DNS to do such a thing. But notice one thing here, I don't know if you can kind of see here, but there's a trailing dot at the end of the host names. I'm sure a few of you know that this exists and that's kind of how DNS does things, but why is it really there? So the difference between a trailing dot and none is the same difference between absolute file paths and relative file paths. So like relative file names, I'm sure we're all familiar with, it can be mangled or mapped incorrectly. So depending upon how your local DNS is set up, if searchexample.net was in your resolve.com and you navigated to example.com, the DNS search query would take the URL to not be fully qualified and therefore look at example.com.example.net. If you navigated to example.com.dns would not apply that search path defined in resolve.com. So if there is a dot at the end, it's unambiguous, fully qualified domain name and not prone to search path spoofing. I did not put a dot while navigating to rogland.com. Chrome actually assumes the dot because it's kind of not user-friendly to always have to put the dot at the end. All right, so continuing on, handling all these DNS queries got me a bit more curious. What is the route that a DNS query takes to finally get an answer for where rogland.com is hosted? This is not exactly easy to figure out. Once the DNS query hits my router, my wireless router, it's a bit of a black box where the query is forwarded to if it's not globally cached. I know that my computer's DNS set up to 192.168.1.1, which is my router. And then my router's DNS is set up to this IP address, which I figured out by logging on to the admin page of my router and actually hacking in because I totally forgot the password to my router. So if I do a host query on my router's DNS, I get a pointer to comcast.net subdomain. Now if we do a who is on the IP, I see that Comcast, my internet service provider, owns these IP addresses. But beyond that, I don't know if Comcast's DNS has rogland.com cached and if not, where the query goes after that. But DNS is also hierarchical and getting familiar with the dig command can also help us understand at least how these queries are resolved. The dig command has a trace flag that makes iterative queries to resolve the name being looked up. It will follow referrals from root servers showing the answer from each server that was used to resolve the lookup. So trying this out on python.org, I apologize for the small print, but if we do a dig command on python.org with the plus trace flag, you can see kind of the path that it took. You see the start on the dot, the root servers, and it goes down to the org. Then python.org and then finally we have an IP address at the end. Now for the more visual learner, I made some nice graphs. So the dig query starts at my local DNS, 192.168.1.1, where, if not cached, is then passed on to the root server. The query from my local DNS for python.org first asks the root server, the dot, who knows that one of these hosts should have information. So the name server responds with try one of these hosts, which corresponds to the dot org name server. The dot org name server receives a query and says something like, well, try one of these hosts, which corresponds to a python.org name server. And python.org name server says, yep, I seem to have an A record for python.org, and it's 140 to 11, 10, 69. And that's sort of the complete hierarchical path for python.org. But what if we wanted to know more about subdomains like hg.python.org or others? If we do a dig command against hg.python.org, we actually get a cname record that points to osuosl.org, which is a different name record elsewhere. Again, sorry for the lots of print, but I'm sure that python.org has more records set up, not just like one cname that we know of. If we're curious, we could run the dig command against python.org with the any flag, which is up here. And unfortunately, not much has come back beyond A and NS records. Looking at pyladies.com, it's a little bit more interesting with SOA records pointing to name.com and MX records pointing to Google and our A record pointing to our web host. What you won't get using any with dig is a full zone file or DNS setup, like all our cnames. And I'll get a bit more into that later on. So we can easily resolve the path that DNS would take to look at python.org or pyladies. However, it's not the most efficient way that DNS can respond to queries. The root and top level name servers like .in.org would be inundated with many requests. And this is where DNS caching comes into play. So when a DNS resolver or authoritative name server receives a query, it searches its local cache for a matching label. There is no label on its cache, the server may instead do one of two things, depending upon how it's configured. It may either retrieve from its cache and return a referral response containing a resource record of a name server type whose label is closer to the query, however closer is defined. Or the DNS resolver may be able to initiate the same query to an authoritative DNS server responsible for the domain name, which is the subject of the query. And so the authoritative name server can respond with an answer, a referral, or a failed response. And so that response then is accepted by the DNS resolver, and if configured, it will most likely store in its cache. So if my local DNS server did not have a cache record of python.org, it could send the DNS query to the root DNS server and get pointed to name servers that handle the .org domain. But since I have visited many .org sites, my DNS most likely has .org name servers cache, so I can just skip that first root query and go on to the .org. It sort of trickles down from there. Very simplified. All right, so DNS caching sounds all great in Hunky Dory until you get to propagation. Propagation is how long one has to wait for DNS changes to show effect. DNS will hold a record for as long as it's TTL or it's time to live number, at which point it deletes it. After it's deleted, if someone makes a request that refers to that deleted record, the DNS server will have to go through that whole process again, query root server, and then caching it. Now setting a too high of a TTL number, local and ISP caches will last longer, and therefore your friend may not be able to see your glories to do at immediately after initial deployment. Having too long of a TTL and then the server may or may not have the ability to deal with a higher query load. Now another pain point of propagation in TTL, some ISPs completely ignore it and set their own expiry for records. Caching actually opens up the ability for poisoning. This is by far not my area of expertise. But as I understand it, DNS poisoning works like so. If the server doesn't validate DNS responses, for example, via DNS security extensions, someone could exploit that by essentially spoofing an IP address that here she owns for a given hostname, forcing visitors of that certain hostname to be directed elsewhere. To be able to spoof a DNS entry, an attacker would have to create a response faster than that of a legitimate authoritative server. You can effectively DDoS a DNS server with probable non-cache entries, giving the attacker many opportunities to respond with fake DNS responses. So while random domains that are cached are not very useful, the attacker would be able to give nefarious responses from their illegitimate name server for the desired domain to compromise. Alright, so TLDR, DNS is essentially a black box. Since you cannot simply spin up a DNS server yourself and be connected to the global DNS network, it's particularly hard to debug and get your hands on to take apart. So essentially the takeaway is that DNS is hostnames mapped to IP addresses, like reddit.com to its IP. And it's essentially set up like a hierarchy, so if your local DNS server doesn't have reddit cached, it knows where to go to find that record. And lastly, caching and propagation is probably why you want to rage quit when deploying your sites. But DNS is pretty awesome. There are a few ways to interact with DNS as well as use this dinosaur of a technology that I found cool. So starting off with interacting with DNS. Previously, when doing the dig commands with any flag that I showed, you may have noticed that we couldn't really see any CNAME records to map anything like www.pilates.com. You can certainly run dig commands against www.pilates.com, but how many other subdomains are there? Being able to look up a full DNS zone file is rarely allowed beyond being an admin yourself. So there's this handy little tool called DNSMap that literally brute forces subdomain lookup. So I tried this on Pilates and as you can see it returned four results. This tool is limited to its built-in word list, which you can also supply yourself. So don't exactly expect the results to be comprehensive. As I can see, I know that Pilates has over 30 subdomains that I only got four here. And it's not that fast because it does searching based on a built-in or provider word list one at a time with no multi-threading. Next, when I was playing around with DNS I wanted to figure out what exactly was cached on my local DNS. For at least OSX, you can see what's cached by literally killing the process which flushes the cache to the syslog. So if you take a look at my syslog we can see some familiar records. Or at least I know that they're familiar because that was DNSMappingSpotify.net. And I got more than 300 responses. So just for fun, this is the captured packet when I was sniffing traffic generated by DNSMappingSpotifying. As you can see, the Ethernet layer and then the IP layer where it went from my computer to my router. And DNSMap was trying out the ZR from its word list. And so you can see the question record down here and then the name and then the type and the class. The response to this was a no domain because it doesn't actually exist. Now we're at a Python conference and I'm hearing you say not enough Python. Don't worry, I got you covered. You can create your own DNS forwarder with Python and I chose Twisted Naturally. Here is a simple DNS server from the Twisted Docs. We can run this, we can fire up Skape and run dig against the Twisted server for Python.org. So we can see that this captured datagram with Skape is that query digging through the Twisted DNS server. And you can see the question, the name, type and class. And then here is the response datagram with the DNS resource record associated with it, including the original question and then the type of record, TTL, data and resource record name. All right, so those are three awesome interesting ways to interact with DNS. I showed you how you can use a DNS map to find subdomains via brute force even though it might not be inclusive. Finding your local DNS cache, at least on Mac, and then running your own Python DNS server with Twisted. Now time for interesting ways to use DNS. So, Dain. Dain stands for a domain-based authentication of named entries. It's basically a protocol for certificates to be bound to DNS names using DNS security extensions. It's sort of similar to two-factor authentication that we as users are familiar with. Essentially, Dain is a way to cross-verify the domain name and the certificate authority issued certificate. The problem that Dain is trying to solve is that the TLS certificate does not verify that the organization running the web server officially owns the domain name. As well, the DNS record does not contain information regarding which certificate authority is preferred by this organization. So this weakness was actually exploited twice in 2011, both with Komodo and with the Dutch certificate authority, where attackers were able to generate fall certificates, giving them the ability to perform man-in-the-middle attacks. So what Dain does is provide a way to cross-verify the domain name information with the host certificate authority issued certificate. On the blog post that I will show again, I explain a little bit more in detail on how Dain is set up with the DNS server. Just real quick, though, for those curious, the DNS Python library does have support for Dain with the ability to create and manage TLSA records. And Twisted Names also has been working on implementing DNS SEC, including Dain support. I don't know if he is in here, but Richard Wall, Twisted Core Developer, is giving a talk about the Twisted DNS and the progress that he's made with DNS SEC, so highly encourage you to go. Another nerdy nugget of awesomeness that I uncovered is that you can use DNS for service discovery. There are a few ways to use DNS for service discovery, but mainly boils down to the question, what servers run this service? As mentioned, one can leverage DNS to help answer this question with the use of SRV records. SRV records within DNS zones map canonical names typically in the form of underscore service name dot underscore protocol dot site to host names. And at Spotify each service has its own SRV record with one record canonically named after the service itself. When you spin up a Spotify desktop client, it does an SRV lookup similar to this dig command and it finds four possible hosts here. For the more visually inclined and some of you are, this is a simplified diagram of what happens. So the Spotify client looks up where to connect with SRV dig query. The service lookup then continues on. So the Spotify app connects to an access point, for example ap1.spotify.com and then the access point resolves the service that the client is looking for which might be like user service to grab your user information. This is all done with SRV records. Last little nugget I discovered is the ability to store a DHT ring within DNS. So DHT stands for distributed hash table. DHT gives you a dictionary-like interface or a key value store, but the data or nodes are distributed among a network. So looking at Spotify again, we store some service configuration data with a DHT ring within DNS TXT records. Once again, you open up the Spotify client and want to play a particular song foobar, one that has not been locally cash on your machine. So the client performs a lookup on the song, the track ID is first hash, and then that hash is essentially the key within the DHT ring. That particular key is then looked up within the DHT ring that is stored within the DNS TXT records. The value associated with that key is essentially the host location of that service where the song or its relevant data is located. So in this case, instant E from 9E to C1, which is where this particular track foobar lives, and is mapped to a particular host. Completing this flow and instant E is mapped to a host name, which would be the machine that houses the data on the foobar track. This dummy host name tells me that this machine hosts information on tracks that can be connected to via port 4301 as located in our London data centers is in pod A1. This is sort of like a dummy host name. A bit confusing, I know, but using DNS for DHT ring allows Spotify to leverage the distributed characteristics of a DNS system. So hopefully I didn't lose you with DNS. Dain is going over real quick. Dain is suggested to secure DNS with your certificate authority. You can also set up service discovery with the help to help figure out what servers run this. And finally, you can use DNS for a DHT ring. So yeah, I threw a lot at you. But DNS is by no means easy to understand within a 25-minute talk. And I guarantee you, you will screw up your DNS deployment configuration because DNS is hard. Bit of a black box because it's not easy to debug, and it's certainly not easy to make a good talk out of. But I hope that you learned some things and if you want more information, you can check out the blog post. I'm out of time, but thank you so much.