 All right. I'd like to welcome our next speaker. Matt's going to be speaking to us today about hiding in plain sight disguising HTTPS traffic with domain fronting. Take it away, Matt. Thank you. Hi, my name is Matt. I'm a software developer. Come here my first time at Defcon. Can everyone hear me up the back? I heard there's some sound issues but lots of thumbs in the air. Thanks very much. Cool. So a lot of us have probably heard about domain fronting. It got to a fair bit of attention recently when signal messenger made some noise when I think Google said, hey, we're going to put an end to this. So there was lots of questions about what is it? There was an academic paper floating around which I don't really understand how to read papers like that. So I asked my friend to help me out and he explained it to me and I sort of figured it out. So I thought let's bring it here and explain it in simple terms to maybe people who don't even necessarily know what TLS is or just have rough understanding. So maybe to some people going to a basic level, that way more people can understand. So hopefully we get the right balance. So what is it? I put on my dictionary writer hat and tried to come up with a dictionary definition and it's abusing an implementation detail of shared infrastructure to disguise the true destination of a HTTPS transaction. So there's a few key words in here. Implementation detail is one big one and shared infrastructure is another. And we'll go into why these parts are important later. And HTTPS. As far as I know there's probably not really any other protocols that share different paths to different things within the same place that connections get terminated at. Domain point is not new. It's been around for quite a while. It didn't do an exhaustive search but it has, this is the oldest thing I could find that uses it. It's some kind of proxy that's written in Python and the code of GitHub, the commit message says abstract host substitution. Maybe that's the old name for it. Cool, so let's look at some of the advantages of domain fronting. Sorry, I messed that up. Sorry, users of domain fronting. I mentioned signal messenger. They primarily use that for bypassing censorship. Lantern seems to do the same. I think that's some kind of a VPN. And then there's meek client which is an obfs proxy for Tor. I presume pretty much everybody here has heard of Tor which is usually used for, you guessed it, bypassing censorship. But also for evading detection. So the obfs proxy there's different plug ins for Tor basically that let you pretend you're different types of traffic. All kinds of interesting ones. I wish I could go into them today but that's a huge rabbit warren. And then malware. They'll be bypassing censorship. Obviously censorship isn't just a country saying no you can't go to this website. That's also corporation saying no you're not going to Facebook at work. And also evading detection. Malware doesn't want to be going to hey you just got hacked.com. So advantages. Why is it useful? Well it can't be detected without breaking TLS. Someone has to actually perform some kind of a man in the middle attack in order to be able to see that you're doing that. So one potential case here would be if I'm running in a corporate network which doesn't want to invade the privacy of their staff by intercepting their TLS communications I won't be able to tell that this is happening. So maybe a piece of malware could use it to evade my firewall from detecting it. It uses existing infrastructure on the server side. So when I've been playing around with this it's mostly CDNs like cloud fronts and not cloud flare anymore. And not Google cloud anymore either. And it's compatible with anything that can be tunneled over HTTPS. So if you can think of a way of tunneling your traffic over HTTPS you could use domain fronting to send that traffic around. And it's very simple and easy. It sounds complicated but once you realize how simple it is you'll just walk away and go cool I'm going to go do that at home it will take you five minutes. Okay where is it not useful? Bypassing censorship where the TLS is man in the middle. So if I break your TLS session I can see you doing this. I said that before. And it's not useful for hiding web traffic in a normal web browser. You can do it in a normal web browser but each web page you go to is going to load things from all kinds of different places. You need to find suitable domains to hide behind and not every website would have domains that match. So yeah it needs to be on a shared infrastructure and the front domain for that destination needs to be known. So I'll explain what I mean by front domain a bit later when I go into the demo. And the security is void. The reason for that is you need to basically trust a different certificate to where you're actually talking to. Again we'll see more in the demo. So how does it work? Well let's go back to the days of HTTPS 1.0 and obviously when your web browser was doing you're trying to browse to www.mywebsite.com and you do a DNS look up. Your DNS server returns a server address and you connect to that web server and say hey get me the home page and it returns it. The problem here of course is we needed one IP address per domain we were hosting. So if you wanted to host both websites you needed to have two different IP addresses and some of us couldn't do that or and then we obviously want many millions of websites maybe we would run out of IPv4 even faster than we currently are. So they introduced the host header so basically that just changed this sequence diagram slightly but not at all really. It just added this extra piece of information on the second green arrow here where we just specify hey I want to go to website A or website B that way we could share that IP address across both of them. But it's 2018 and we encrypt web traffic what we're supposed to. So you know we this thing called TLS I guess was its original name sorry SSL was its original name and now known as TLS basically just takes HTTP as it was wraps it in a secure layer that validates you know where you're talking to in some cases it can validate you yourself if you decide to do bi-directional authentication. Am I talking to my bank or am I talking to someone that claims to be my bank? It takes care of that for you and it encrypts the traffic between you and the server as we know. And without changing the protocol so as the application developer it doesn't matter if I'm using HTTP or if I'm doing IMAP or POP or whatever you know TLS was designed to just go over the top. However sorry I'm getting ahead of myself here. So TLS handshake is pretty complicated and I don't understand anything but the first two that's happening there. And it's not relevant for domain fronting. Basically what happens here is your TLS handshake your client says hey I want to talk TLS and then the server goes sure I am your bank. Definitely your bank not someone else. And they do some kind of special packet dance and arrive at some encryption keys that they both agree on. And from there everything gets encrypted. So it's a layer it's hidden from me as the application developer but everything from that point is encrypted and as the application developer it just looks like the same protocol that I came up with in this case HTTP. So forget about all that stuff that I don't know. And we pretend it's not there and just focus on the first couple of messages. Now if I'm sitting on a network watching your traffic okay so maybe I'm at your Starbucks Wi-Fi, maybe I'm your ISP, maybe I'm a government, maybe I'm your sysadmin at your company. I can't see what's happening down the bottom here. That part's encrypted obviously. But what I can see is at these first few messages are unencrypted. The client hello actually has a part at the beginning that says hey I want to go to google.com. And the reason for that is because at the beginning when we look back at our HTTP 1.0 versus HTTP 1.1 is I've totally messed this up. I'm sorry. The reason for this is when I say hey I want to talk TLS the server immediately needs to know who it is representing. So if I have website A and website B both on the same server and my client says hey I want to talk TLS I need to send it back a certificate to say hey I'm website A or website B. So that meant TLS had to be extended slightly to identify which server we're talking about. Am I talking to google.com? Am I talking to yahoo.com? So that got it's a SNI extension so I think when did the Xbox 360 come up? That didn't support it. I just know that if you want to support Xbox 360 it doesn't support this. You need to have one AP per. So that's roughly how long ago it got introduced but it wasn't a standard thing. So say I'm on your network I'm sniffing your traffic and I'm watching you use HTTPS. I can see that you're going to google.com because it's sitting there in that very first message that you send in your TCP session to google. So just to recap that the entire contents of the HTTPS session are encrypted which is all that stuff. For web developers maybe early on, certainly early on in my web development career none of this was clear to me like where all this stuff went it just somehow magically arrived and then you got it from dollar post in PHP somewhere. And then what's not encrypted is the domain name of the server we're talking to that's exposed by SNI. So that's how you're even though you're using HTTPS your ISP can see what you're looking at. And the server certificate also isn't encrypted and I can't remember but I think TLS 1.3 is supposed to encrypt the server certificate. Anyone here? No? No. It does. Thanks. Cool. So let's take a look at HTTPS stacks. So with domain fronting what's important about or what makes it work is the implementation of the HTTPS stacks and how they usually get shared across customers say if you're using Amazon Cloud front. So very simple single server stack like let's say I'm when I was 14 I used to run a web server under my desk which was just Apache and it basically it's this first thing on the left. So sorry first thing on the left is a client and then we got the web server which is like Apache and Gen X IAS whatever and that pulls stuff from an origin and that's generic because the origin can be a file system it could be a Rails application it could be a Django application it doesn't matter. This is just to show the separation of concerns. So you see this web server is responsible for HTTP TLS and caching and when you're running one website it's all pretty simple and yet what we like to do usually for performance reasons though is delegate the TLS. So TLS is reasonably heavy on the CPU and perhaps what you're doing in your application might be very heavy on the memory. So we often we split this out and this is often what's happening at somewhere like Cloud front or Google Cloud where they have lots of customers and they want to have a very fast end point with an IP that lots of customers use that just does the job of terminating those TLS connections and getting them to the next link in the chain which is the next web server that goes and then fetches the content. And so when we're delegated this is where I can steal that little NSA graphic I love. So this is how I think Heroku kind of works is that's a shared infrastructure provider. I haven't tried if domain fronting works for Heroku yet. That's probably something I'll do right after this. But you basically you talk to that reverse proxy whose job is hey let's terminate TLS here and then it goes to something else whose job is look at the host header in the request and routed to whichever one the host header matches. So what happens in the CDN networks often is the reverse proxy decides what certificate to send back to the client based on the SNI header that thing you saw in my Wireshark screenshot and then the next link of the chain decides where to send it based on the host header. Which they don't always have to match and in domain fronting they basically we deliberately make sure they don't match. So if we were to make I'm sorry this is way too small to see on the screen but if we were to make a request to www.good.com and say SNI header hey I want the certificate for good.com then I get my TLS session established and then I say okay get me the home page and the host is www.evil.com and what will end up happening is if that reverse proxy that should satisfy the TLS session then the router would look at the host header and go I want to go to evil.com and this seems to be roughly how it works behind the scenes at Amazon. Unfortunately I don't work there I don't know anyone that does. I wish I would love to know why it works there. So I've already ruined that one. And again remembering what we saw in the wire shark I would only see if I'm sniffing that work I would only see the packets going to www.good.com. Why does it work? An anonymous Google software engineer said it worked because of a quirk of our software stack and Google has since done what they can to get rid of domain fronting. So it highly depends on implementation. In order for it to work the shared infrastructure must not check for a mismatch between SNI header and host header. Apparently that's what Cloudflare does to stop it and it kind of makes sense to do that. And also HDP requests must be routed separately to how TLS, the TLS. So both those layers need to be dealt with separately. So to put it together what do we need to actually do? All we need to do is connect like normal to one host and set the host header to another and they've got to be on the same infrastructure that can route between them. So we find evil.com needs to be accessible via the same infrastructure or something innocent looking. And the infrastructure needs to have the right implementation quirks. So let's say there's plenty of websites on a popular CDN like GQueer is on a CDN. And that same CDN I could go and sign up for myself and put my malware command and control server behind that CDN which would allow me to use somebody else's domain who uses that same CDN to go there. I keep giving away the... So finding them there's loads of websites there. If we go to say Alexa top 500 we should be able to just do reverse DNS lookups to Google.com points to this, whatever. This customer is using Akamai. I think Facebook uses Akamai. Also would be another good one to try. And they're easy enough to find and sign up for. So what makes some domains better than others? It depends really what we're trying to do. If we're evading detection like malware we want to have something that looks pretty business as usual. If I'm in a company that sells apples it probably doesn't look very suspicious if I'm going to fruit.com. Or maybe something innocuous. If I've infiltrated a company and I'm trying to ex-filtrate some data and that company is also a marketing company that uploads a lot to YouTube maybe I could hide my stuff through YouTube.com. If I'm in maybe a country that blocks access to sites which is signals problem is they were in a country which was blocking their messenger and they also apparently block websites. They chose an e-commerce website which they thought would have collateral damage. So if that country were to block that site that would negatively impact that company's country as a whole so perhaps be a reason why that wouldn't get blocked. And then maybe you could do a combination of all. You can find something that looks like business as usual. It's innocuous and it's got collateral damage. Probably a really good one to go with. So I've talked a lot and I'm going to keep talking. But instead of boring slides let's fire up this SSH session and I'm going to try and talk and hold the mic at the same time. One second. All right so what we're going to start with is I have a root shell here and I'm going to start off a TCP dump so you can see the traffic that's actually leaving this machine. So this is just an empty EC2 instance in Amazon. So there should be hopefully no other traffic on 443. So this command here for those not familiar with TCP dump, TCP dump takes network traffic that's going through your network card and logs it somewhere. And I'll just briefly go over the arguments here. Minus C4 means stop after I've seen four packets. And so there's three for the TCP handshake and then the fourth packet is our first message from client to server which is our client hello which has the SNI header in it. Minus A means give me ASCII output, show it straight to the terminal. And it's like a performance optimization like don't do a reverse lookup to the IP address. And I, our interface, F0 and then we're looking for TCP port 443. So let's run that and then switch to another screen. And then we're just going to run a curl to pick some delphotos. It's like a lower Mipsum but for pictures. So curl minus S just means shut up, don't do anything except get that and give me the output to standard out. And then we're going to pipe that to MB5 so we can look at okay what was the content I actually saw. So the content I actually saw in this case or the hash of it is C8EAD. So the purpose of this is just basically go get me this web page, make a hash of it so I've got an idea of what I saw. And then let's get another web page. So in this case, protectyourprivacynow.com. And obviously different hash. So these are the two hashes. And then what we're going to do is take a look at when I do a domain fronting which one I get. So hopefully this makes sense. Now just realize I've messed up my TCP dump. Okay, so only the first curl request got shown up. Only the first curl request got dumped here because it stopped after four packets. So you can see right here this is my SNI header that says hey, I'm going to pick some dot photos. And then let's run the TCP dump again and switch to the other screen and run, I'll just run this curl command here so when I'm going to protectyourprivacynow.com you can see on the other side you've got in the same, roughly the same place I'm going to protectyourprivacynow.com. All right, and it's in my history because I'm testing it out. Cool, so basically let's do some domain fronting and turn over to let's get TCP dump running again. Okay, so all we've done here is we're still establishing our TLS connection with protectyourprivacynow.com. So you can see that most of this curl command is just the same as the one above it. The only difference here is we're telling curl with the minus H flag change the host header to pick some dot photos. And so when I run this, the MD5 sum of what came back is actually pick some dot photos but if I go and look at the other side where I've dumped the packets it actually just shows protectyourprivacynow.com. So when I said it was really easy and you can go and do it in five minutes at home, there it is. So there's a few risks here. Obviously one of them is the reliability. When I try going to protectyourprivacynow.com I don't know that they're going to keep it on that same CDN as pick some dot, pick some dot com, sorry pick some dot photos. They could change that at any time and they could point it somewhere else. So a potential solution to that is you could have a list of backup domains. If you were shipping an app that relied on domain fronting maybe you want to have a list of other ones that still work. But a bigger problem is you can't validate that the server is authentic because if you're connecting to protectyourprivacy.com but then really intending to talk to pick some dot photos the server you validated who you're talking to is protectyourprivacy.com not the photo site that you're trying to go to. And also since you're connecting to that place that you don't, you didn't set up yourself they could change that certificate or CA at any time so you can't pin it either which you'd normally be able to do if it was a self-signed cert or whatever. So also the traffic is visible to the infrastructure provider. So say if you had traffic that you needed to hide from Amazon Cloud front for example or it was sensitive. Because you established your connection with site A that infrastructure provided just by default is able to see what you did. It's kind of obvious if you're used to using a CDN but it's not that obvious sometimes when you do these tricks I guess I forget about all these sort of things. Sensitive data could be stolen and malicious payloads could be injected so basically just treat it as an unencrypted connection encrypt and sign all your messages maybe come up with your own way of validating that you're really talking to who you think you're talking to. Cool. So I actually saw a DOS attack happen via domain fronting it was a combination of some not quite configured correct not quite correctly configured infrastructure and domain fronting. So what happened was we had a very there was a very normal web server set up you know you start with one web server you refactor your application and you make it two web servers and then you you know you eventually end up with a cluster of them and then you end up going okay we need a CDN and so it was put behind a CDN and so they mark the CDN IPs as trusted proxies to the web server as you do because obviously TCP connection hits the CDN endpoint and then the CDN makes another connection and then attackers found a nice slow web page and they decided to make thousands of requests to it but they use domain fronting which in the case of the CDN that was being used and for some reason was going between two different IPs so it was going from one CDN endpoint to another and then to the web server which it didn't expect. So basically the connection diagram would look like this. You client way over to the left connects to reverse proxy that's one TCP connection and the reverse proxy then connects to the web server. So normally because the web server can only see the connection from 2, 2, 2, 2 so for those that can't see the IP addresses they're a bit small. So the web server sees the reverse proxies IP address from that actual TCP so the reverse proxy usually adds into the X4 to 4 header hey I'm forwarding it for 1.1.1.1 and the web server knows that because the actual TCP connection comes from 2, 2, 2, 2 that it can trust that 1.1.1.1 is really who it's talking to. And that's what I just... So in this case what was happening was the user was hitting the front domain, the domain fronting domain that was being hidden behind that added X4 to 4.1.1.1 it got forwarded to the other one which added another X4 to 4 header and then passed it up to the web server who didn't really understand that there could be more than one. So that allowed... That basically opened up an IP spoofing vulnerability because the web server was thinking that the user's true IP was in fact the IP of the CDN endpoint. So what we expect to happen usually is that header has one address which is the true IP of the user or at least that's what the web server was expecting and that the web server reads that header and it knows who it's talking to. In actual fact there was two, it didn't understand it and it assumed that the CDN was the real user. It got worse because we had failed to band config it and it ended up banding the whole CDN and taking the side out because it thought the CDN was dossing it. And the root cause misconfiguration would have forgotten look at the full list and because it went through domain fronting which is a way that it wasn't expected to work it invalidated that assumption. So in order to plug that know your infrastructure. X4 to 4 is actually a de facto standard. Your infrastructure provider might do it differently to how you expect. I think cloud flair has a different cf connecting IP or something is completely different and wherever there's a possibility to get proxied multiple times make sure that the chain is trusted. So if you've got a CDN endpoint that maybe hits another one that hits a load balancer that hits another load balancer it needs to be able to draw a line between all those end points and go okay I trust this one which trusts that one and find its way to the true IP. So in that case NGINX is I don't know the other ones unfortunately but it's real IP recursive if you want to Google that and cloud flair's had a connecting IP. So the future of domain fronting I don't think it'll last much longer. Cloud flair has already said they're getting rid of it. It can't be relied upon. It's implementation details which could change. Like I mentioned before Amazon might just suddenly start doing something differently one day. Netlify, beluga all those CDN providers whatever you use they might find a more efficient way of doing it or decide to get rid of it. Different regions of the world might use different infrastructure. I think I noticed this when I was in China I went to apple.com and I was like I wonder where that's going and I had to look it was completely different to the result I was getting from Perth. It was back in my one place and some other CDN elsewhere. So if you were writing something that needs to be used all over the world they might be pointing to different infrastructure. And it's also actively possible sorry it's possible to actively prevent it from working. Which is cloud flair's deliberately doing this by checking for a mismatch between the SNI and host header. Or by using SNI for routing the request if you remember the several layers of where it decides on its final destination. It could just simply use the SNI header. Which I believe was the intent behind HTTP 2. It seems unwanted Amazon and Google have appeared to have responded to pressure against it. Google's already broken it deliberately. Amazon said they will. And cloud for some reason says it's a risk to their customers. It would put our traditional customers at risk. I presume that means people behind corporate firewalls because those corporate firewalls can't necessarily ban malware traffic if it's using the main fronting. So the main fronting may be not that useful for making your own personal HTTPS private. So a couple more suggestions here. TLS 1.3 has explored the possibility of encrypting the SNI component. It didn't make it into the spec because it required an extra round trip in order to, there was actually two proposals. One was an extra round trip to negotiate a key first. The other one I think was based on a static key. I didn't really pay much attention to it. However I did read recently that there was work done on this by cloud flare I think. And this was only in the last couple of days I think that came out. Use a conventional tunnel like a VPN if you set the tunnel up yourself. Tunnel traffic other ways. There's other protocols out there for achieving this. I've seen some pretty obviously old school malware commander control IRC. A friend of mine tried to do IP of a Facebook messenger. That was pretty horrible. HTTP over XNPP. And just coming up with crazy silly ideas can be pretty fun. But I hope you all walk away from here understanding how SNI facilitates sharing infrastructure that uses TLS. Working around that problem of needing one IP per host. How third parties can see where your HTTPS traffic is going without doing a man in the middle attack. How to find domains and actually do domain fronting yourself. And also how to protect yourself from misconfigurations like what I saw before. Where basically the infrastructure just didn't do what was expected. Thanks for having me. Any questions?