 I'll get started with a summary what I'm going to talk about in case you want to bail out. I'm going to have a small presentation of, you know, the background and related standards about that where Istio and DNS interact. I'll describe briefly how Istio implemented its own DNS resolver, its own DNS solution. And I'll go over the use cases where Istio is most affected by DNS, which is security, security cluster and egress with service entries. And for each of them I'll try to do kind of a balanced approach with the options and different trade-offs for each solution and hopefully give you some ideas that you can use in your own deployments. DNS is a very, very old protocol, 83, first tariff C published. It's one of the ossified protocols, meaning that it's still in use today exactly as it was there, more or less. Very difficult to change. A lot of people have used and abused DNS in many ways. Like all the old protocols, security was not part of the original design. It was retrofitted later as DNS security extensions and far later, thanks to privacy concerns and browser devices in 2016-18, the DNS over TLS and DNS over HTTPS were introduced. Related standards, again, 95 IPsec, HTTP11, also without security in 97 and finally HTTPS 2000. As an idea, out of 150 million dot com domains, about 6 million are currently signed, but that's actually very good because many of the dot com domains are just basically vanity domains or squatting, so 6 million signed domain is actually a good number. And we'll see why signing domains and using those protocols is very important for Istio users and for all mesh users, actually. As concept, again, reminder, DNS is a top-down protocol. It's based on authoritative name servers that basically own a domain. It starts from root servers, goes to top-level domains like com, FR and so forth, and the authority is delegated down to subdomain and to authority servers. API server is also kind of a source of truth for all the names in Kubernetes and it has an associated DNS, which is, again, clustered local equivalent of the authority in the Internet. Then on the other side, there are the recursive resolvers, which are actually the resolvers that go to the top-level roots, go down and find the actual entries that you're looking for, and that's where usually, hopefully, DNS signatures are validated and a lot of other features are implemented that we'll discuss later. Next, non-recursive resolvers, which are usually just caches running on your DM or on your physical machine. And their main role in modern DNS is to secure the communication between the host and the recursive resolvers. And finally, stop resolvers or client libraries that are linked into your application and usually just don't do too much. But recently we've seen people adding more features and more security to the client libraries. In DNS, some common terms, it's split horizon where the external view of a domain is different than the internal view of the domain. Private DNS, where the resolvers, usually the recursive resolvers, create their own internal domains that are only visible inside your organization. And of course, DNS hijacking spoofing, which has been around for, again, 30, 40 years. DNS is one of the most misused protocols by internet providers and by a lot of attackers. There are many implementation of DNS. I'll mention BIND, which is still in use today. First released in 1986. It was released a few times. But DNS mask 2001 is probably the most popular because all routers, most phones, it's still used in Kubernetes. Then power DNS, unbound and a lot of high performance. The DNS community generally prefers to have many implementations. So, you know, we have diversity of implementations. So the CVs and bugs are not affecting the entire internet. Most recently, core DNS, which is a default DNS server in Kubernetes, STAB is an interesting one, which is just plain DNS to DNS over TLS, providing security. And of course, Istio DNS, which we'll discuss soon. One side note, there is DNS over TLS that is providing security, but there is also tunneling over DNS, which is, again, also from the late 90s, people figure it out that you can, just like we do tunnel over HTTP2 in the tunnel, you can do tunnel over DNS requests. You basically encode your packets as DNS requests, looking from the external, from resolver as a DNS request. You send it to the authoritative server. It disguises the responses as DNS responses. And you establish a communication channel that can be used to bypass firewalls, network policies, and a lot of things. Many people forgot about this. It's not very well advertised, but it's still working perfectly fine. There are iodine or some other solutions that were perfectly fine for this purpose. And for people who remember this, usually they mitigate by blocking the external DNS access. And that impacts a bit Istio. Again, few of us probably have to deal with this, but it may happen. And then, of course, DNS will be blamed. Again, SOCs and HTTP proxy protocols are designed to avoid applications having to do DNS requests. So they do the resolution on the proxy. Application doesn't do any DNS. And usually, when you do this, you need to be careful and block to not have any recursive resolver that is running somewhere that can be used as a way to bypass the firewalls. On Istio's side, we implemented this DNS stub. It is a stub from the DNS technology. It is enabled at pod level or in a VM, if you're running in VM, through some environment variables. You can look it up in the documentation. There is a lot of doc on our site. You can also enable it globally for the entire mesh. One thing to keep in mind is, first of all, we are not using DNS over TLS. We use it in early days. We use DNS over XDS in order to optimize and to take advantage of the existing connection between agent and IstioD. It's still over TLS, but with some layers in between. Istio DNS is not a recursive resolver and is not a validating resolver. It always falls back to the platform resolver for any names that it doesn't know. So if it's not a cluster, the local name, or it's not something that you have in a service entry, the platform resolver will be invoked and hopefully it's secure because or you're not using egress. As an alternative to Istio DNS, you can use a platform resolver. So platform resolver is what you find in most cloud vendors. They provide some APIs to configure private DNS zones and provide a resolver infrastructure. Most cases, I can bet you have a secure DNS infrastructure already because again, 40 years of DNS and people figure it out that they need security. But it's useful to check that your DNS infrastructure is set up properly. You have, it's scaled up properly. It is using access controls. And most recently, I mean, thanks to ACME protocol and some others, there are a lot of libraries now that allow programmatic access to DNS. External DNS is an example of using Kubernetes APIs to program DNS. So you probably want to check what options you have available because we'll discuss some automations that you can use to take advantage of the existing resolver infrastructure. Let me start with the first use case and the reason we started to implement DNS in Istio, which is security. Security usually think about how to authenticate a client, but the first step is to this client to authenticate the server. So if you talk with example.com, your application will make a DNS request, will get back a response. Istio and pretty much all meshes that rely on IP table interception will look at the address, find a listener, then match it with a cluster where you find the identity you need to check. Now, this is working perfectly, MTLS will do all the validation and everything is good, but if the DNS is returning the wrong address, then Istio will have no idea that you actually wanted to talk with example.com and not evil.com. So Istio will take the address that it gets from IP table, will validate the certificate of evil.com, which is apparently what the user wants to talk with, and then things go wrong. Again, nothing new, nothing unique, this has been known since again 90s, but it's important to be aware and to always make sure you have secure DNS infrastructure. It is not a problem for HTTPS because the host header is used to decide what certificate to validate, and it's not a problem from ProxylSGRPC, Sox, HTTPS Proxy, which again do not use IP at all, they use FQDNs. As I mentioned, almost everyone has a secure DNS infrastructure in the cloud, so it's not something that you should be, you know, scared about this use case, again it's well known. Istio DNS once introduced in case for users who have their own setup, you know, Raspberry Pis or kind of off-prem, and they lack the secure DNS infrastructure, in which case it's very, very useful to either enable Istio DNS or to get a secure DNS resolver. The second use case for Istio and DNS is the multicluster and name visibility. In Kubernetes, as you know, cluster local means local to a cluster. There is a core DNS server that is taking care of resolving the names inside the cluster for the pods running in the cluster. If you have a VM, or if you have another cluster, the name has no signification for it. It has no, no, it doesn't exist. So, what do we do? People have tried different hacks, but in the end Istio D is watching all the clusters, and we can aggregate all the services from all the clusters and provide it to the resolver that we run in the agent. This is good or bad. Again, there are people who believe it's very convenient and I completely agree. It's a very nice feature. On the other side, it combines all the names from all the clusters in a big mesh that is, you know, called cluster local and may not be ideal for all situations. And it requires using Istio DNS, which again is a relatively new product, new implementation, unlike Bind, which has many, many years of experience. The other option that you have is to use the platform resolver. In that case, you are using some automation tool like Terraform or whatever you want to use to program the platform DNS resolver with a private zone. You can use a name like cluster set.local, well, this is taken by Kubernetes multi-cluster, but you can use example.com or prod.example.com. You can program only the names that you actually care about, that you need to be shared across clusters, not all the services need to be programmed, and you'll be able to use your own domain like prod.mycompany.com instead of cluster.local. And then you'll obviously use service entry to make sure that your domain names are known to each cluster. It's not very complicated Kubernetes. It's also the approach that Kubernetes multi-cluster is taking by defining the cluster set.local, which is defined as a zone that is visible to all clusters in a multi-cluster environment. Now finally, we get to the interesting part. TCP egress, and egress in general. Let's say you have a mesh, you want to talk with an external provider internal site, let's say www.example.com, which has two IP addresses, two public IP addresses. For SNI, HTTPS, and HTTP, usually we have the host header that has www.example.com, and we know exactly where to go. But for TCP and for Istio interception, all we can use for deciding what policies to apply is IP address returned by DNS. DNS, modern DNS is used for a lot of things. I mean, it's used for load balancing, it's used for, in order to make sure that the client gets an IP of a data center that is closed by regionally. It's also used with any cast, and usually IPs are massively reused. I mean, we are short on IPs those days, so most content distribution networks don't give an IP to each customer. So it is very difficult to get a unique IP in order to apply the policies. And that's where Istio DNS is actually at its most, provides the most power. The solution in Istio is obviously to create a service entry with the name of the domain, and assign a VIP, some IPs that will be used in the policies. So we do this, Istio will know that 10.1.1.3 is actually the external site you want to apply policies to. However, you still need DNS to resolve dcp example.com to this address instead of the address that it's normally resolved, which is again unpredictable. To make things even more interesting, that's what application is to resolve. If the request goes to the sidecar, Envoy obviously needs to go to the actual destination, so it will need to resolve to the real address. So we need a way to have DNS returns to different results, one to the application and one to Envoy. And that's pretty much what Istio DNS is doing. It's intercepting the DNS only for the application. DNS from Envoy is excluded. Everything works relatively fine. But we solve the problem, but introduce another small problem in that now we have a way for any namespace owner to hijack and to inject arbitrary IPC into the DNS, which is, if you remember, a few slides back, not something that we want. So if you have those kind of use cases and you enable Istio DNS, please use OPA or use Feverit validations to make sure that whoever is allowed to create service entries, there are sufficient controls in place, you know, either by restricting them to a single namespace or by validating the host names that are pushed or reviews or any other technology to prevent misused. Also, it's very useful to segregate HTTP from TCP and to use dedicated TCP ports for all external services. If you use a dedicated port, IP no longer matters, so you can apply policies based on port only. And if you have HTTP segregated from TCP, HTTP policies will be based on host without, again, IP no longer matters. With those mitigations in place, you can use a platform resolver instead of Istio DNS if you choose to. And another option that exists is for egress in particular is to actually set the environment variable necessary to enable the source or HTTP proxy, in which case, all the policies will be applied in the actual egress gateway and you avoid all the DNS lookups and many DNS problems. I also mentioned a solution for this particular problem that, again, it's perhaps complicated but it's very clean, which is to create a private zone, again, under your own prod.example.com, and create subdomains for the external names separating the front end, which will be, you know, external domain.egress.example.com, so basically under your own control. And that's what the application will talk with. So modify the application config to talk with this new domain. This you can control, you can resolve it however you want. It's under your control and as I mentioned, usually there are all kind of access controls for the private DNS. And when you write the virtual service, you will actually go to the actual DNS. So we avoid all the problems related with application and avoid needing different results for the DNS query. And as an extra benefit, you get to use infrastructure DNS which usually is, again, more mature and more, you have audit logs, you have all kind of telemetry and stuff. Those are the three use cases that are most important, that are other implications for DNS. Since everyone is talking about ambient, I want to clarify a bit. Ambient also provides a DNS stub, so it has a resolver. It behaves slightly different from the normal Istio agent DNS resolver in the sense that obviously ambient is per node, so we lose the ability to return different DNS results to different pods by creative use of service entries, export to sidecar import and all the other tricks that probably we all know and love. And the rest, again, segregation between HTTP and TCP doesn't help too much because waypoints are producer side, so by the time it gets to the waypoint, client no longer has any control. But a nice thing is that Z-tunnel has built-in support for SOCs. So if you set SOC protocol, you pretty much solve all the problems. You don't even need interception. It goes straight. It's the most straightforward way with the price of an environment variable set on your application to enable SOCs. And you talked about the future because we're out of time. DNS has evolved a lot over the years. I mean, we know about resolving name to IP, but in reality DNS is used to distribute certificates. You can use it for SSH public distribution. Dane, which is a way to publish certificates or hash of the certificate in DNS, the DOT, DOH standards, which hopefully will, you know, kind of see more adoption and more validation, perhaps in Istio agent. Keep in mind, I mean, keep track of DNS because it is a very critical infrastructure component. And usually when something breaks, probably DNS plays a role as well. So that's it. Questions, please. Any questions for Kostin? We have a few more minutes. Oh, hi. Sorry if it is a truly specific question. However, we use the DNS interception provided by voying in Istio, but we couldn't find the Prometheus metrics to understand better if it's working correctly, how fast it is, three hours into one. Is there any, or if there is, how can I enable it? I'm sorry. Well, again, Istio DNS is a relatively new technology in terms of DNS history. We do not have a lot of those extras. Hopefully, there will be contributors who will add them. What can I do? But if you use an infrastructure resolver instead, most like you'll have pretty much all the telemetry you want plus, so it's plus a lot of other stuff. Trade-offs. Yeah, great question. Any other questions from the audience? At this minute, I'm not seeing any questions. I want to thank Kostin again for a great deep dive on Istio and DNS. Thank you so much.