 All right. Hello, everybody. I saw this giant room they scheduled us for, for a talk on DNS. It was a little surprised, but actually it's filling up not too bad. My name is John Bellamerick. I'm from Google. And this is Yong. You want to introduce yourself? Or I'll just be a little longer. So Yong is at Avanti. And we are here to talk about core DNS. So what is core DNS? I don't know. How many of you actually are familiar? I mean, if you've been using Kubernetes, you may have been using it without knowing it. How many of you are familiar with core DNS? That's pretty good. So yeah, when Kubernetes first started, it had a very strange DNS implementation that included sort of a DNS mask running in a pod, along with XCD running in the same pod, and a special controller that wrote the DNS config file, DNS mask config files so that it could serve up things. And Yong and I, at the time, were at a company called Infoblox, which some of you may know. And we were like, this is insane. And so our chief DNS guy there, Cricket Liu, he hooked us up with somebody at Google who had been building this core DNS thing. We're like, what is this thing? So we actually worked with him. And we worked to integrate it with Kubernetes so that instead of having these three processes all running in a pod and sort of a lot of failure points and other issues, we had a single process that talked directly to the Kube API server and, of course, does a lot of other things. So the whole goal behind it wasn't just to work with Kubernetes, but to work with, in general, in cloud-native ecosystems. At this time, of course, Kubernetes wasn't the only container orchestrator of Merit. And so we intended to integrate with others. But of course, Kubernetes won. And we primarily do Kubernetes now, although we also integrate. And people use it as just an ordinary DNS server. So hopefully you can check it out for some of that as well. The idea is instead of just serving based on traditional zone files, you can serve up DNS from anywhere, from any source. And then you can also put plugins within the request processing pipeline of the DNS request so that you can do interesting and novel things that are really challenging to do in traditional DNS servers. So architecturally, we wrote it in Go. Or the meek who first wrote it wrote it in Go. And we enhanced it. As a community, we've continued to enhance it over the last few years. Service discovery with or without Kubernetes is a focus, but not the only thing we do. That plug-in architecture allows us to do interesting things. We also added some being sort of trying to do novel things and experiment and see where things are going in the future, which is it provides an excellent base for that. We did some new and novel things like DNS over GRPC. That's not a standard. That is a core DNS thing. But actually, people use it. And you could do really interesting things like pass distributed tracing spans across your DNS if you use that. And so you can kind of get visibility into your entire request flow, including the DNS portion in ways that you really couldn't before. And you can also use it to integrate with policy engines, which is something we do as well. Other integration that probably is more commonly used other than file and Kubernetes are the ones to the cloud providers. So essentially, we can provide and we'll show some more detail on this in a little bit. But you can use the same DNS server, core DNS, to front for zones in any of your different clouds. And so you kind of can bring everything together in one place there. So we do have 350 contributors. We have a lot of folks who've, because it's a plug-in architecture, what will happen is somebody will have a particular interest and they'll develop a particular plug-in. And then they can, obviously, they become a contributor. But if they're really interested in continuing on and maintaining and owning that plug-in, then they can become a maintainer as well. So although 350 is a large number, it actually is a pretty small community. And so we'd love to have any of you come and join us. Recently, we have, like I said, mostly contributions around individual plugins. There's some bigger things we'd like to do, but we haven't had the, you know, we need the right contributor who has the right need to make those things happen. Recently, our project lead, who was Meet Gibbon, who started the project, he has decided to go on and do other things. And so we've moved to a steering committee governance model. So that's the last item on this slide where it changes our governance model a little bit. But honestly, we've never had to escalate anything to the steering committee because it's a pretty friendly place. So what have we been up to recently? We just released 1.11.1 not that long ago. We have one new plug-in in there and then enhancements to existing plugins. As I said, we do some novel things. So, you know, some of the interesting things, you see these plugins on here, things like a template allows you to sort of dynamically generate records based upon the metadata that comes in on the request. So it's really things you don't normally see in a DNS server. So I'd love to have people play with those and see what they think. I think that's it for the project updates. Yong, you want to talk a little bit about service discovery and some of the ideas there? Okay, yes, okay. So service discovery is obviously one of my favorite topics. Your mic's not on. Oh, and I wanted to say before that we do have a bunch of swag up here. So after the talk, you guys can come up and grab some and enjoy. Yeah, I want to discuss about service discovery. That's my favorite topic in DNS. Many people probably ask, okay, service discovery by itself sounds like interesting, but how is that related to DNS? You know, DNS seems to be pretty simple, right? It's a DNS record with just one string with IP address, what else do you expect, right? How is that DNS going to trust anything interesting into service discovery? But I'll discuss that in several different angles. So first of all, why DNS still exists? Nowadays you have SDM, your software defined networking that can define anything you like. You're not limited by like a side of rock. Your IP address space can randomly assign anywhere you want, right? Everything can be, if you want everything to be hard coded, you can hard code. You can say your web service $1.1.1.1 and inside of a company it's probably gonna work if you want to do it this way, right? What's the point? What's the deal with DNS? The one thing that's very important with DNS is that one, DNS is a nice indirection and this indirection is very flexible and it's something you'll definitely want. Let's talk about this way. If you have a corporate infrastructure and you want to make a change, even if you do, let's say, software defined networking, you hard code those strings so that hard code those IP addresses so that people don't need to remember the DNS names, which may sound like a DNS is getting deprecated or out of favor, but in reality, changing IP in SDM is still going to be a lot more harder than changing DNS record. I mean, you change your DNS record, especially with the core DNS. That's just like one line of change, if you like. So that's why the DNS still has a place. Secondly, DNS by itself is distributing nature. Many people didn't realize. When people talk about the distributed system, people talk about RAP protocol, talking about how big the system can be, but many people didn't realize that DNS is supporting a massive scalable distributed system. That is the internet. The whole internet is backed by the DNS. So certainly, it's scalable and it's distributed in nature. Another thing that many people didn't realize is that DNS by itself is a pervasive in your IT, even in your IT infrastructure. Here, I assume, many people coming from the operations or DevOps or SIE background, but in IT space, the people still dealing with the DNS, your corporate still had VPN, your corporate may still have some internal website, and you had to manage all that. And what's the communication channel in between those two worlds? Your customer, outside, and your internal DNS, internal servers. Actually, DNS make it very easy to manage both worlds. So that's why nowadays, even with the SDN, with all the advancement in Kubernetes, with all those things, you still see a place in DNS because it's just very pervasive in both the IT infrastructure as well as your DevOps website environment. By the way, John mentioned that DNS, QuotingS has a lot of support on different protocols. One interesting feature from QuotingS is the cloud sync, syncing data, DNS record, from your local QuotingS all the way to cloud. So how does that cloud sync matters? One, when you use DNS, because DNS, by itself, is this really nature, you actually can just forward your upstream DNS server to, let's say, the Google's 8.8.8.8.8, or forward to, for example, Cloud Flare. I think Cloud Flare is 9.9.9.9. No, it's 1.1. Anyway, but the cloud sync app from QuotingS is different. QuotingS, when syncing with cloud vendors, QuotingS actually takes secure communication through ACP to the cloud vendor's API endpoint. So the communication is not done through UDP, which is less secure, but done through ACP with proper authentication, with proper authentication and authorization. The cloud sync app with QuotingS is also handled with TCP, and that makes the error handling a better place, and the whole communication is much more reliable. And finally, the separation of data sync app and the DNS query is a big deal, because when you sync up with the cloud vendors, you don't want to be blocked by the path of your forwarding to DNS query. You actually can get authentication, get the authorization, get a record from the cloud vendor, pull back into the DNS, and then into the QuotingS, and they expose the DNS locally to your services that's deployed locally. The local environment, the DNS communication with UDP seem to be OK sync, but if you want your DNS to throw us through the whole chain of DNS and go talking to your local DNS server, all the way going back to the cloud vendor, that can add additional paths and that can cause failures at any time, and that could be a very critical thing in your operational environment. We talk about cloud sync app, and another interesting thing about QuotingS is that QuotingS can actually integrate with the multi-cloud. Nowadays, when people talk about multi-clouds, there are several reasons for multi-cloud. One reason, obviously, is the data sourcing and data residency. In certain countries, they may not want to share the data with another country, so you may be forced to say, okay, in certain countries, you have to choose a cloud vendor that's maybe different from the rest of the world. That's one thing. Another thing that's actually came to my attention is that when we talk about the multi-cloud, we also did another reason for multi-cloud is also the MIA, especially with the current high interest environment. Many companies may have some financial situations, such as MIA, became more active. In case of MIA, if two companies or three companies combined together, you may not have a choice, you have to do multi-cloud, and you may not be easily migrated from one cloud to another. I don't know if I've ever moved from one cloud to another in the past several years. Great hand. Oh, interesting, okay. How are they experienced? How are they experienced for migrating from one cloud to another? Okay, I think everyone will agree that if the application or architecture of your service has been designed in a way, it's clear, it's rock solid, moving from one cloud vendor to another cloud may be a smooth experience, right? But that's unlikely to be the case if you're dealing with MIA situation, because in MIA, you're always dealing with all kinds of legacy code no one will touch. Well, whoever wrote the code has already left the company, decided to go with some other interesting things, right? And in that situation, as an operation guy, the only thing you want to do is don't touch anything, just leave alone, right? Yeah. Okay, so that's a multi-cloud. In terms of quoting another thing is, quoting is also a diversified source information, as I mentioned, quoting as consolidated, both customer information from cloud windows, from Kubernetes as well as a corporate DNS server, so that will serve IT infrastructure. And the final point, people may be asking, okay, nowadays you have AWS or Azure DNS, what's the deal with quoting as you need a DNS server? I think you still need to consider that, because even though, as far as I know, all the cloud vendors, all the three major cloud vendors are already offered 100% SLA. Many people didn't realize what does that mean of 100% SLA? When we talk about SLA, we have to think about the consequence. What if the SLA has not been met? What is for, it's not 100%. If you look through the terms or contract term, you're gonna realize the only consequence is that they will return the money back to you during the time that the service is not available. Unfortunately, as I said, DNS is a simple service and it's a cheap service. Okay, so the company I work for at one point had a major loss because the DNS server from one cloud vendor was done for several hours. That cost a certain millions of dollars of loss, but when we talk to the cloud vendor, we realize the money we can recover is so small such that they even offer option of gift card, Uber gift card. Yes, because they realize, for several hours you only spend maybe like $100 in DNS. Okay, so do you want $100 to return back to your bank account or do you want the Uber gift card so make a good gesture, right? Yeah, that's a true story. Yeah, it's a true story, I experienced. Okay, so I think I need to speed up a little bit. So that's one architecture we talked about in the past already, but I want to go with that. One architecture, you can go with a multi-cloud with simple setup. You have a DNS server locally that's called DNS and you have some infrastructure, both AWS and Google Cloud. Now either your corporate IT or your multi-cloud deployment need to expose certain services from different places, different cloud vendors are going to achieve that. So this is one example where you are going to expose the DNS in a similar way, even though they are backed from different cloud vendors and the cloud DNS, right? So I'm going to show a core file to make it happen. This is a core file that's actually going to work to achieve what I described in the previous slide. You only specify three sections. The first section is say the roughly history. You specify how your coding is going to talk to roughly history to fetch a record and if a record is not available in roughly history, you can set up the for through line. That means the code DNS will iterate through different plugins and they will for through to the next plugin, which is the cloud DNS, which is the Google Cloud. They're going to search for DNS record. Now it's okay. So if your DNS record is still not available with the service you'll provide, it's not on Google Cloud that will, then you can fall back to your local code DNS to figure out where the service is located. Of course it can be your corporate IT environment or you'll have some internal services that you want to expose. I think, okay. So that's some of the examples of how you can play with multi-cloud with server discovery. I'm going to hand over back to John to talk about demo plugin on code DNS. Thanks y'all. So we've mentioned a few times that it's a plugable architecture. So one of the things you might want to do is create a plug-in. What we're going to go through here briefly, we have, I'll try and do it in six and a half minutes and then we'll have 10 minutes for questions. We're going to do a plugin where, whenever a DNS request comes in, there's certain information about it contains and one of the most obvious things it contains is the source IP. That's something you might differentiate on. So in, you know, there's other information you might attach. I don't know how many people are intimately familiar with DNS, but there's other extensions to DNS that allow you to attach other information. But everything, they always have a source IP because you always have to get the response back. And so what we'll do is a very simple plug-in that basically if the source IP is in a particular subnet, then we'll return one value, otherwise we'll return another value. This is a stupid plug-in, you'd never do this because it's, you know, but it serves as a decent example. And, well, I'll get to that later. Okay, so there's basically just a couple of things you have to do to create a plug-in. The first thing to understand is that, as Yang mentioned, Cortianus is configured with something called a core file. It's not a core dump, but it's a core file. And it's just a simple, Cortianus was originally based on an early version of catty, which is a web server, and so the same format. And, but, or a slightly derivative format of that. So basically each, the core file contains stanzas for each zone, and then within each zone, you define which plugins you wanna enable and the specific configurations for those plugins. So when you're writing a plug-in, the first thing you have to do is create a function that will parse that configuration and set up your internal information. So that's essentially what the setup function does. So typically that's in like a setup.go, like we're showing here. The init is like the module init. So if you're not familiar with go, this is like, this is done at, init is immediately called whenever the module is, initialized basically at binary, the time the binary starts up, one time. And so that's used to register the plug-in with the catty infrastructure that parses the configuration file. So it's basically saying, hey, I'm a plug-in. If you see this keyword, invoke me. So you pass it your setup function and the directive to call and then catty will call in, the main server will call into your plug-in if that appears in the configuration file. It'll pass you the parameters for that and you just populate your internal structures, whatever they are. So that's the setup.go, pretty simple. Then you have something that actually performs your request processing. So there's a serve DNS method. This is the only method you have to actually implement in order to create a plug-in that, to process your request. And it just gets a, it gets a request in and a response writer, a place to write the response back to in and does its thing. If it doesn't know what to do with it, like it checks, is this a request that is in a zone I manage? If it's not, it passes it down the chain. Or in the example Young talked about with fall through, fall through allows you to do that same process of like, do I manage this request? But even for a single zone. Normally, core DNS, a plug-in manages and owns a zone, a back-in plug-in. But you can sort of divide that up with this fall through and say that, oh, this zone, I manage some part of it, but I'm not gonna manage all of it. So even with Kubernetes, on reverse lookups, we do this. So reverse lookups, the Kubernetes plug-in needs to manage the service sider and the pod sider if there is one, if you're doing pod IPs, a pod DNS. But the service sider, if nothing else, and on a reverse lookup, if something doesn't fall within the service sider, we wanna kick it out to whatever else and do the reverse lookup. So fall through is used there, for example. Okay, so what are that? I took too much time on that. So init function, like I said, super simple, register the plug-in with the word demo and call the setup function. So that's when it's parsing the config. It goes through the configs and adds itself kind of to the plug-in chain, very straightforward. Serve DNS. Okay, we grab the, we take a look at the request and we decide, we were saying, okay, if the request is the IP, the source IP starts with 172 or 127, then reply with this. Otherwise, use the 8888 reply, add that to our list of responses and we're done. This is literally like what? 20 lines of code to write that plug-in. So plug-ins are super easy to write and you only need to think about the specific functionality you care about and you can just hand off everything else to the rest of the system to deal with. The core file ends up looking like this. This is saying that this core file manages dot, so everything. And so demo is gonna handle everything and this is just how to run it. So if you pull this deck down off of a sketch afterward, then you can copy this out of here. So that's the link to it and I did it in five and a half minutes, hopefully not leaving too many questions. Before we go to Q and A, we'd love to have you come join in. If you've got a unique plug-in, we have tons of external plug-ins. There's two classes of plug-ins. There's the ones that are built in to the binary that we build and then there are external plug-ins. But it's pretty easy, but plug-ins are not dynamically loaded. If you're familiar with Go, Go is not great at dynamically loading things. I mean newer versions have it, but they're just compiled in. But it's quick and easy to compile so you just add a line to a file, compile and you've got your own custom Cortianas with your plug-in in it and you can, the other kind of gotcha is that the plug-in order chain is not related to the core file, it's related to the compiled-in ordering. So those are probably the two biggest gotchas with doing plug-ins but otherwise it's pretty straightforward. So thank you and I think we can go on to Q and A. Any questions? You will get first pick of the swag. Yes sir. So the question is in the core file, do you have to follow the order of the plug-ins? No, you can put them in any order but when the request happens, the request will be processed in the order of the compiled-in order of the plug-ins. So for example, there's a plug-in to handle files, so zone files, and there's a plug-in to handle Kubernetes and there's a plug-in to handle Cloud DNS from Google. They're in a specific order so the first plug-in that gets the opportunity to handle the request is gonna be the first one that's compiled-in in that order, regardless of the order you enable them within the core file. And now you can grab something. Yes sir, it might be better, I think it's recorded so you probably should use the mic. That's why I repeated the question last time. Yeah, I have two questions, hopefully they'll be quick. The first one is for the multi-cluster example. Are you putting that configuration in a separate core DNS that you're running outside of your Kubernetes cluster or just the one that comes with? So basically, when we talk about multi-cloud, you're actually talking about your service that's deployed into a multiple-cloud environment or your local cloud environment. However, your DNS is scattered around but there's one core DNS that's sitting at the center of the information so that you can see through all the backend information like from Google Cloud or from AWS or from Azure. So there's one place that you can see as a source of the truth. Yeah, like it has nothing to do with the core DNS that's running in Kubernetes. It could be completely, you could be running it in Docker, you could be running it on a hosted... And in fact, I would recommend that you don't do anything else in the Kubernetes. So we have a bunch of things like external DNS where you can external Kubernetes plug-in where you can take things that are like your services or your external IPs on your Kubernetes services and you can expose them via DNS. Even for that, I would recommend you run a separate because that way you don't threaten your internal cluster DNS with external traffic which could overwhelm it. Yeah, that was interesting because that was actually one thing that I immediately looked up was that plug-in, exactly. That's interesting. Yeah, and then the second question I had is we have a very kind of niche use case. We're using core DNS SRV records for service discovery between our multiple layers in Kubernetes. We have a desire to be able to set the waiting on a particular instance on the fly. Do you know if there's anything off the shelf that does that in core DNS or is that something that we would want to write a plug-in for and just put in the chain after the Kubernetes plug-in or? So we have a few interesting plug-ins. I don't know if they do that off the shelf. We have one called rewrite. I don't know if it can rewrite weights on SRV records. It usually is for rewriting names and name responses but there are some, you can rewrite some header sites. You should look at it. I'm not sure it does that. There's a template plug-in that probably doesn't do what you want because you'd have to know everything up front. You could easily implement something you might want to look at. There's a policy plug-in. It's an external plug-in I think but it actually allows you to do pretty sophisticated things. So we have this what we call a metadata plug-in that if you enable it, it will actually kind of take the request and make information that's embedded in the request in weird ways available. So each plug-in can kind of enhance the request with metadata. So for example, a Kubernetes plug-in can add the service name, the pod name, the namespace depending on what information you have and then you can use that in later plug-ins in the policy plug-in for example. I was gonna say that because that was the next big thing is if you were to write your own plug-in, would you embed your own essentially Kubernetes client in there to get that information if you wanted to like put it on the pod as an example? So that's actually an interesting idea. Yeah, there's a lot of tricky trickiness and we can talk afterwards if you want about source IP based mapping to pod IP. Like it's not easy, it doesn't work as well as you'd hope, put it that way. Thank you. Thanks for your presentation. I had a question. We run EKS and lately we had problems with Kodi NS timing out extensively. I don't know if it's a Kodi NS problem or an EKS problem. So our walk around was using node local cache on each node. I was wondering how do you deal with that kind of situation on EKS if you're familiar with it? Let me, the microphone's a little hard to understand. You're saying you're using your node local DNS cache? Yeah, so we run into massive time out on Kodi NS. So EKS cost that could not resolve anything. So we couldn't figure out what was going on. Apparently it's a common problem in EKS. EKS and Kodi NS, there's a lot of people that have the same problem. Our solution was to use node local cache. That seems to solve the problem. But I was surprised that Kodi NS had it easier with EKS. It's probably not Kodi NS. So there's a number of issues. We could look into, there's a few issues that we've seen over the years. But I would be surprised that we're still seeing some of them because they're related to kernel bugs and UDP time outs in the contract table. Those are some of the really old ones. EKS, depending on the kind of node you have, has request network limits that in DNS world are really low, like 50k QPS, right? But in most people's circumstances, a node doing 50k QPS is high, is a high number. But when we were doing testing back in Infoblox, even five years ago, we couldn't get Amazon to nodes to deliver very good DNS service because of all those limits. So there's a lot that can go into it. I'd love to maybe talk more afterwards. It's really hard without getting more detail, but the stuff that's served out of Kubernetes, it's literally straight out of memory. You don't even need DNS cache. It's like we're already having memory, the whole thing's loaded. It should be almost instantaneous. So it's likely a networking issue between there or a UDP contract. Like UDP, the entries in the connection tracking table in the kernel have to time out, whereas TCP, they get removed as soon as the connection's closed. And so that's why one of the things node local DNS does is it upgrades the connection from the node to the central core DNS to TCP so that it'll get, it won't run into these contract issues. So that's one of the... Yeah, I don't want to add a couple of things. Based on at least my past experience, but never see a like a DNS issue. I normally will look at the networking to make sure the path that runs from your service or from where the DNS request happens can reach to the DNS server and try to dive in with the networking issue because that's most likely scenario. As John mentioned DNS, that's one reason when we do the cloud sync up, we use TCP. I mean, people may believe, okay UDP, this can be more efficient, but that's not necessarily the case because the UDP other than DNS, who else uses UDP? Okay, so optimization, now the focus on this area in Linux kernel in any place, people are focusing on TCP. So TCP normally have a high optimization but UDP are falling behind. In fact, we are not even finding a lot of work. UDP related software than DNS. That's why if there's any issue, there's likely going to be networking issue or UDP related issue that's caused by kernel. But I mean, most of the time, unless you're doing something really massive. No, we're not. Or your DNS are serving some record in a very, I'm gonna say TTO, very short TTO. It's very unlikely your DNS will be flooded. We tried a bunch of those, increased TTO, increased cache, and in a bunch of other things. It worked for a while, then it started falling again. Well, talk to you guys after the presentation. So just to get more information of how we can solve the problem. Thank you. Okay, sure, thank you. Hi there, I am manager in a room of engineers. This might be a very stupid question. Thinking of modern practice by which engineers are developing DevSecOps platforms and consuming them. One of the patterns that we see is developers using bastions within AWS, on EC2s, and then we have our DevSecOps workloads such as Kubernetes and Nexus running within cluster. As we have those resolving with publicly trusted certs, what is a method by which we consistently resolve those records, both in cluster and out of cluster? That might not be a plug-in question if it's out of scope at this talk, I apologize. For some reason it's hard, people are soft spoken and it's really hard to actually understand your, maybe because we're behind the speakers. Are you saying the, you're serving, you want to do DNS resolution for workloads that are in the cluster and out of the cluster? Workloads that are within the cluster but provide that experience consistently. So let's say like, SEO creates an internal load balancer, it knows what it is within cluster and manage it, we can resolve that, or so that within like, RAF 53, we can register that in RAF 53 through some manual operations to get ops flows within the cluster as well. How do we manage those custom DNS records? Custom DNS records from within the cluster. So I think you can do that, there's a couple of options, right? So one option is you can create external name DNS, external name services, and then it just works as it does today. The other is you can use, this is a case where you might want to use additional plug-ins within your Kubernetes DNS instance. I talked about not doing that earlier but that was more because of external traffic coming into the cluster could overwhelm your cluster DNS and cause your things within the cluster to fail. But if what you're doing is things within the cluster need to get DNS access to things that don't live in the cluster, then these additional plug-ins would be just the right thing for it, I think. Thank you. Yeah, I think we're over time, so thank you very much, come and grab some swag, I don't want to take any home. So, thank you.