 Hi, like you said, I'm Danny Cooper. This is Alan Worth. We're both at Akamai, and we're here today to talk about DNS rebinding, which some of you may have heard of, some of you might not have. It's actually a fairly old vulnerability, but it's a gift that keeps on giving. So very briefly, we're both security researchers at Akamai. Usually we're administrators and authors for BKPCTF. We had an off-year this year and didn't run it, but we're coming back, I promise. And I hope I see a bunch of you at the BKPCTF, because we'll be playing and it'll be good to see you all. So I'm going to present a scenario. Suppose you're browsing the internet, and you know, it's the current state of the internet. So it loads a ton. You go to some website. It loads a ton of third-party JavaScript, but, you know, your OS is up-to-date-ish, your browser is up-to-date. What's, you know, what sort of bad things do you expect to happen? Well, obviously you might get popped by an O-day, or, you know, there might be C-SERF, maybe someone's going to row-hammer you from JavaScript, or do some kind of other awful CPU and memory-based attack. But would this outcome surprise folks? The outcome of your router joining a botnet, and your fridge starting to mine Monero? So I personally would find this surprising mostly, like, not because, you know, it's that surprising that your router's in a botnet, or your fridge is mining Monero, but because of sort of the pivot factor here, where you've had a sort of web-based attack and in network compromise. But it turns out that you can do this kind of thing using DNS rebinding, which is basically using some very particular DNS server settings, as well as JavaScript running in the end-user browser. And what this lets you do is sort of skirt the same origin policy. It lets you make requests against a unexpected second domain, or a second IP address, rather, that your browser will treat as being from the origin that you served your malicious JavaScript from. In the wild, there have been some pretty neat write-ups recently. Actually, the write-up that got me and Alan to start looking into this and write our tools for performing these attacks was from Tavis. He found a bunch of really cool problems and put them up on Project Zero. The most notable was probably the Blizzard BattleNet agent, which basically, if you don't know, is an application that you install and that keeps your Blizzard games up to date. And what Tavis found was actually that by using a DNS rebinding vulnerability, he could get full RCE by just having someone visit a website, and then he got the agent to load a DLL. That he supplied. Similarly, there were kind of similar vulnerabilities in two-bit torrent clients, U-Torrent and Transmission, I believe, where they had JSON-RPC interfaces exposed over localhost, and you're able to hit them with DNS rebinding. And my personal favorite was the Electrum Ethereum Wallet, where you couldn't get RCE, but you could steal all their money. Shout out to Brandon Dorsey. I don't personally know this person, but while we were doing our research, we came across a sort of similar DNS server implementation. We have different features between the two of them, but if you're looking for sort of a parallel evolution implementation, check out this person's work on GitHub. So what we'll be talking about today is first I'm going to give some background on how this attack class actually works and sort of dive into the very, like, gritty specifics. Then we'll talk about the tool we built and why you might want to use it and what benefits it has. Next up, we'll talk about all of the really horrible optimization work that needs to go into making a DNS rebinding attack be fast. And finally, we'll talk about areas that we think could use more attention and that are ripe and rich for future work. So first up, let's talk about the normal operation of a sort of canonically vulnerable application. So what you have here is you got a service running on your computer that for some reason is divided into multiple processes. And for whatever reason, the designer of the service has decided to use a local host web server as the RPC medium between the processes. This is actually not that uncommon of a design pattern or maybe anti-pattern, depends how you feel about it, because partially it's rather portable. It's going to work on OSX and Windows and Linux out of the box. So there are reasons to do this and people aren't wrong for wanting to have something that works. But basically the way this works is that Process A wants to do some sort of IPC. In this case, it's triggering an action by Process B, so it posts across local host. And B, because it's only listening on local host, thinks this is a safe channel and executes the requested action. Now, first pass attempt at exploiting this. You might just, you know, from attacker.com, serve some JavaScript and have that JavaScript make a get or a post against local host. And if you don't need to set HTTP headers or if you don't need to retrieve the response body from your target, this could work. This is not DNS rebinding, this is just basically a C-surf. But because of the same origin policy, if you need to set those headers or retrieve the response body, this won't work. And in short reminder, same origin policy basically just means that if you've got JavaScript from one place, it can only do a very limited set of things to other domains than the one it was served from. Now, let's talk about how we would attack something. First up, you get the user to go to your attacker.com. So the end user browser issues a request to the local resolver to lookupattacker.com. And then that local resolver goes forward to a recursive resolver. In this diagram, the recursive is external, you could be running your own, it doesn't matter. And that recursive then goes forward to our adversary-controlled authoritative DNS server because we control that zone. So when that request comes to us, so I've highlighted a detail here. It would have to be like bob.attacker.com or something to keep them unique. But so we recognize that we haven't owned bob yet. So we send back 1.2.3.4, which let's pretend we control that IP, and say that the result is valid for the next second. What happens then is that the 1.2.3.4, 1 second, percolates back through the recursive to the local and to the browser. What happens then is the browser's got an IP address, makes a request to the adversary web server, and we serve them back some JavaScript. So far, nothing that weird has happened here. The short DNS TTL is a little bit unusual, but it's not that far beyond the pale. There are reasons to have fairly short DNS TTLs, so nothing here was really abnormal browsing so far. So let's talk about the attack. What happens next is that our JavaScript tries to make a request against attacker.com again like an Ajax request or something. And so that gets delegated to the browser, which sees a request for attacker.com. It's the same origin, so we can retrieve the body if just fine. We can set headers. We can do whatever we want. Now, hopefully what then will happen is that the browser triggers the name lookup, and we get that the record has expired because we set it with such a short TTL. And if this doesn't work because, you know, your local cache says, no, I'm not going to hold a record for only a second. You can just keep requesting until it does expire. So what happens then is, you know, it goes back out. Your recursive's record has also expired, hopefully. And the recursive makes a request to the DNS server we control. And this time around, we know that we've already heard from, you know, this target. So we return a different result. We return 127.0.0.1 and a long validity window. Now, what happens then is the browser gets back, oh, attacker.com is located at 127.0.0.1. And the browser then happily goes forward on localhost because it treats it as the same origin as the one we served our JavaScript from. And this means that you can do whatever badness you would like to do. In terms of scope of exploitation here, it really depends on the target application. In the cases I mentioned, there was RCE. But the, like the electron wall, you could only steal money. Some other cases, you might only be able to moderately annoy the end user. You know, it depends on what capabilities are exposed by this RPC. In terms of mitigations or reasons that your attack might not work, there's a couple. The easiest is that if you're a developer designing a system like this, you can just check the host header for the requests for your localhost IPC. And so this looks something like process A sends along the host localhost in the host header, and then process B upon seeing that request says, oh, okay, that's the host I was expecting. And as long as the host header you're sending isn't one that an adversary could get the browser to send on its behalf, so like, don't make up a host header that someone could then go and register and control as a domain. But as long as you don't, you're good to go. Because what happens is that when we try to hit process B now, the browser on our, you can get the browser to do a lot of things on the same origin, but something you can't get it to do is lie about the host header. You can set other headers, but you can't get it to lie about the host header. It will not do that for you. So we send the host attacker.com and process B says, hey, that's not who I'm supposed to be talking to. So this is probably the easiest way to mitigate this particular attack. Another option that is, there's some pitfalls to this, but I'm still going to talk about it, is to use some kind of cryptographic authentication scheme that is not TLS. TLS is option three. An option, so something like you sign, you just sign your RPC. And important note here, you need to have those keys be generated on a, either per execution or per installation basis. If the keys are bundled in the application, you can then forge signatures. So normal operation, process A signs the request to process B. Everything's happy, process B accepts it. But then when we try to own process B, we can't generate a signature, unless of course they screwed up and used only one key pair and bundled in the installer. So we don't do that. Also, like I said, pitfalls. The scheme also, as described, is vulnerable to replay attacks. So be careful if you choose this option. Finally, you can just use our old friend TLS. The reason this works is in normal operation, process A sends client hello, gets back the server hello with the cert, and process A knows what host name to expect for once more it could be local host. This doesn't have to be a real web PKI cert. In fact, it shouldn't be. It can just be some self-signed garbage you made. Next up, when the browser tries to do this, what happens is that it sends its client hello, and process B sends back the server hello with, you know, this cert that has in it CN, like a local host, or, you know, garbage.localhost, or who knows, something. And the browser bales out. This is a TLS error. And once more, remember, the JavaScript can only take action, only compel the browser to make requests. It can't, like, make its own and say, oh, no, I don't care about TLS errors. So I just talked a bunch about how to do this, but I'm going to talk a bit about how we did it now. Our attack tool is a single Go binary that basically gives you a DNS server that is stateful and can do those tricky things, like returning different results, a number of queries, or some other things, integrated with a server that will serve you up some logs nicely, and has a splash page, and a server that serves JavaScript payloads. The reason it's all in one application is that we're lazy and we wanted to share state between them easily, because that lets us pull off some really cool tricks to accelerate this attack later. So, so far, so expected architecturally, you know, your DNS server binds on port 53, your HTTP server binds on port 80. Nothing weird's happened so far. But here's where things get weird. A reminder on the same origin policy, an origin is defined as a protocol hostname port tuple. So, example, HTTP, HTTP is the protocol, then foo.rebindmy.zone is the hostname, and 1337 is the port. What this means is that if we want to serve JavaScript payloads that can hit our target on any port, such that we can be the same origin as any port, we need to be able to serve from any port. And the way we handled this was by using an IP table's hack. The hack in question is basically just that all traffic for the payload IP address, which is a second IP address separate from everything else, gets all TCP traffic gets forwarded, or all TCP traffic on all ports, gets forwarded to port 80 on a local IP address. And then our payload server can just be a normal web server that doesn't have to do anything crazy, like listen on every port. It just listens on that local IP on port 80. But from an outsider's perspective, if you talk to that IP address on any port, it will serve you HTTP. So this is what the tools actually looks like in terms of the domains it generates. So what is this garbage? This is a configuration dictionary embedded in a host name. And the reason we wanted this is, hey, why bother having people have to write a config or something? They can just shove it directly into the host name and get whatever behavior they need. So in this particular case, what we've got in our config dictionary is a UID. And this is just sort of a magic key we put in to... So, like I said, it's a stateful DNS server. So if you have more than one person using it, or if you're doing more than one attack at once, you could clobber each other's state. The UID is there to prevent that. So you pick something that's unique. It can be anything that is both valid for this part of a host name and doesn't contain a hyphen, because the hyphen is the key separator. Next up, this is the first IP address that will be returned when resolving this name. In this case, it's just one, two, three, four. Note the X's. That's because, as you might know, dots have semantic purpose in host names. And the implementation got really, really hairy otherwise. So we just decided to have a separator character that did not have meaning in the host name. So in terms of what this IP should be, by default, we usually set it to be our JavaScript payload server that serves up the JavaScript that then tries to do the attack. But if you got your own server, put your own IP address in. Next up, the second IP. This is our target. Here I just have a very vanilla example of 5678. In reality, you might be hitting local host. You might be hitting 192.168.0.1. Might be a very interesting place to go if you wanted to own a router. It can be whatever you want. Finally, the decomp is for dynamic configuration. As I said, all of this garbage is a dynamic configuration dictionary. We originally were going to have a static mode, but it didn't really seem necessary so far. We might add it later if we find a reason to. So having just described this, like, garbage host name format, I'm going to have Alan show how it actually works. Okay, so we're going to walk through a couple examples in the terminal just interacting with this strange server, which will hopefully help understanding because it's definitely very complicated. For starters, let's just look at the delegation for our name. And by the way, you can follow along on your computers. RebindMy.Zone is actually a host named that we own and it's running the server right now. I'm in a separate context from you because it's a stateful server, so you can't mess up my demo, but you can mess up each other, so feel free to do that. So, Digi here is returning the recursive resolution for RebindMy.Zone, which of course comes through the authoritative name servers for that, for, you know, dot and zone. And you'll note that the A record that we got back is 18.217.171.44, and that we also got that A record back from 18.217.171.44. And this is really important because that's what Danny was talking about before, where we have an integrated HTTP and DNS server. So that IP address is the authoritative name server for our name, RebindMy.Zone, and it is also where the HTTP server for that name lives, which is like very unusual. So let's look at a Rebind host name, so I can just choose a UID, ABC123, and I'll set IP1 to be 12345678. I'll set IP2 to be 127001, and I'll configure it to use three hits. So for the first three hits, we'll get back 12345678, and then after that, this name will resolve to being 127001. So we got back an A record of, as expected, 12345678. The TTL is zero seconds because we want to encourage the caches to not keep this in cache very long. And as expected, for the second query and the third query, we continue to get back 12345678. And now for this fourth query, we should expect to get back 127001. So now the name is returning locals, which means that the Rebind has been performed. So on this server here, I'm already running a little demo HTTP server on localhost 41337. It just returns this HTML that says, you know, DNS Rebinding demo server, Hello World. So I'm going to demo the HTTP, the integrated HTTP state feature where we integrate the HTTP requests that are getting made on our server with the Rebinding state so that we can optimize it. So for example, we might do something like ABC456. This time, I'm just going to exclude IP1. If you don't put in the first IP address into the configuration, then it's assumed to be the capture or catch all hostname that Danny was talking about before, which delivers the payload, JavaScript payloads. 127 001. And this time, instead of a number of hits, I'll put in HTTP state and that's just a Boolean, so it's one. So this is returning the payload server IP address. I'm in a separate instance from you, so my payload server is on a private address space, but you will have a slightly different answer if you run this query. But as we expected, we're getting back the payload server. It's just frozen, screwed up my Wi-Fi. I also have all these demos recorded, but I don't really believe in recorded demos, so hopefully we can get this working. Well, we can just use the recorded demo. So we already went through all of this. And so this was my name that I constructed, which is the same as the one that I just had in my terminal before it messed up. So you can see that I have an implicit IP1, IP2 of localhost and an HTTP state is a Boolean that's set to one. So I ran that dig command and I got back the capture IP address, as expected. This is the public capture IP address unlike the one that I had before in my demo. And of course, if I keep resolving this name, I'll keep getting back that IP address, the payload server. Now if I curl it on any port, I'll get back the JavaScript payload, which we'll talk about in a bit, what this JavaScript payload actually does. But now if we dig that name again, so now that we've downloaded the HTTP payload once, now every time that we resolve that name, we'll get back localhost. And I was running this on my personal website server, so when I curled that name again, now that the rebind had occurred, it was returning localhost, so it went to my local web server and served up my awesome homepage. So I'm going to hand it back to Danny, who's going to explain how we're going to use these constructs to build a port scanner. Okay. Cool. So Alan just demonstrated how to sort of hit a server once. And some of you might see where this is going in terms of a port scan, but I'm going to run through it real quick. What we do to do a port scan is, so to scan each port on a target, you need a distinct host because of that, you know, like I said, that issue where a origin is defined by the host name protocol port tuple. And what that means is that you are going to have to have a separate eye frame for every port on every origin you're scanning. What this looks like is this. Basically, we connect on, you know, port 532, you know, hitting back to localhost. We use HTTP state because it's a lot faster. And then to stop us clobbering ourselves, we just tack the port number onto the end of our UID. But basically what happens then is that the first time you load the page, the payload JavaScript that's loaded, and then because we're using HTTP state, the DNS server switches over to localhost. And that JavaScript basically just requests itself over and over again until it stops seeing a sentinel value. And hopefully, if the rebind works, that'll be on the next request. But in practice, caches exist, and it'll probably take a couple. But it just requests itself until it stops seeing the sentinel, at which point it knows that the DNS rebind worked and that whatever it gets back is what's on the port on the target host. And once that happens, the child iframe that was running our JavaScript reports back up to the parent. It says, like, either, hey, connection closed, or hey, I found a web server. So that's all well and good, but I'm sure you want to see it actually happen. So I'm going to hand it back to Alan. Okay, so I hope that my internet has been fixed and I can run the demo live. So what we've got here is just a page that's on RebindMy.zone. You can go and load this page yourself. At the bottom here, there's an iframe that's showing the server request log. So as we make DNS and HTTP requests for the purposes of this demo, they show up in the log. So the first thing that we're going to do is open an iframe on port 1 through 3.6. So this red border denotes the iframe that we just opened, and you'll see that we have a rebind name associated here. So it's similar to the ones that Danny was talking about just now, where we're using the payload server IP address as the first IP address to use, and the target is localhost, because that's where we're scanning. And of course, we've opened this on port 1 through 3.6. In the request log, you'll see that we actually got two DNS requests. That's because browsers are weird and sometimes they make more DNS requests than you need to. And for both times, we handed back the payload server IP address, and then we got a request on the payload server port 1 through 3.6 and served up the payload JavaScript, which is running in this iframe. So now we can have the payload in the iframe make request to itself, and hopefully if the rebind has been performed, this will actually be making a request to localhost port 1 through 3.6. So it did. You'll see that the DNS requests came through and we returned 1-2-7-0-0-1. And when this Ajax request was attempted, it had a network error because I'm not actually listening on port 1 through 3.6. So the port scan result was that this port is closed. So let's do this for port 1 through 3.7, where I do have a server. This is exactly the same. Their name is slightly different, so it has a different state. So you'll see that we have the port number 1 through 3.7 in the hostname. And of course we got DNS requests came in and we returned the payload IP address and exactly the same as last time. So now we'll have it make a request and hopefully if the rebind has been performed and it's not in cache, then we should get back the HTTP banner that was the server that I showed you before. So it did. It worked. So the port scan result was that I do have a server listening on localhost port 1 through 3.7. It got the headers, which is showing that it's a simple Python server and it grabbed the body of the index page which just has the hello world that I showed you before. So that naive implementation is good and it mostly solves our problems, but it's not quite efficient enough to scan large port regions. Even if we just automate the button clicking, we don't have a lot of time talking between each time I press the button, but even we optimized that and automated it, it wouldn't be enough to actually be useful in practice. So some of the time sinks here are we're waiting for DNS TTLs to expire. Even though we set the DNS TTL to be zero seconds, there are local DNS caches, a DNS cache in the browser, a DNS cache at the recursive server that can all have the same name in cache for extended periods of time. So we have to wait for those to be purged before the authoritative is asked again for what to use. Establishing HTTP connections can be quite slow. We have to have a full TCP handshake round trip and if they have to time out, which can happen quite often because if there's firewalls or something, that can take the browser's time out window is, which can be up to 15, 30, 60 seconds. Finally, creating iframes for each target port uses a lot of browser resources. We might naively think, why don't we just make 65,000 iframes at once. Each of them requires a full DOM, a full JavaScript execution environment. It's quite slow and so we have to rate limit how many of them we need at any given time. Our simple optimizations are that we create the iframes in parallel, but not too often. Not too fast. We spent some time hand tuning this, but we think we have a good rate down for a variety of systems. As mentioned before, we set the DNS TTL to just be one second. This is actually not in compliance with the minimum that you're supposed to use, it's like 60 seconds, but we have yet to find a server that works on it, so we do it anyway. And then we also force all the TCP connections to always be closed after just doing one HTTP request. If we used keep alive or something like that, then we might not make as many DNS requests and that would be really bad for us to perform the rebind so we always set connection close and we always force that there's no cache, so this is just to try and get the browser to make as many requests as possible. We've talked about this a few times, but the payload iframes only need to load the HTML once. At first we were using just a simple counting method or flip-flop method time-based in order to perform the rebinding, but it was really slow. And we found that a lot of browsers can make lots of requests so notice that my browser made two requests each time I loaded a new iframe, so it was kind of all over the place about how much time to wait or how many requests to use to perform the rebind. We found that a really, really clear signal is that the end-user browser has downloaded the HTML page once so we always know that we can rebind after that point. This might not work if there's some kind of strange proxy in the middle, but this could require some fine-tuning. Next we needed an iframe to perform a handshake. Sometimes firewall rules and internal browser policies can prevent iframes from being loaded in the first place, so our payload gets loaded at all and we need to be able to detect this case. So the simple answer is that we just use a post message when the payload is loaded at the first time and it just says to the parent, hey, I'm here, I'm going to start doing my port scan. And if there's a timeout and we never get that message, then we know that the browser can't connect to that port at all. This doesn't really tell us anything about the victim, but it's important to keep track of this information that we weren't able to scan the victim on this port. Finally, an interesting optimization is that stub resolvers, browsers, and recursives have surprisingly small cache sizes. The default Firefox DNS cache size is only 400 entries. So if you generate a lot of spurious DNS requests, we can get our rebound prematurely from the cache. So this can just help with forcing the queries to make them all the way to the authoritative. So this works really well for Firefox. So I'm going to give you an end-to-end demo. Okay, so what's going on? This is happening kind of fast. I'm scanning the localhost ports 1,000 through this blue bar as it's going up. There's actually a striped blue bar at the very tip of it that's showing how many iframes have been created. And then the solid blue bar is showing the proportion of the ports that have already been scanned fully and responded. You'll see that the log is showing that all these ports are closed. It's kind of a roast. I found that port 1,3007 was open and the tool actually grabbed the banner and the index body that I showed you before. It's just not in the GUI. So I'm going to hand it back to Danny who's going to talk about future research areas. So we just talked a lot about our tool and I'm going to talk now about how we want to use it and how we want to see other people use it. There's a lot of bugs that we found, I think, in local host HTTP servers as well as basically in-land stuff. So I've alluded a few times to owning routers and owning fridges. That's because a lot of things rely on being behind a firewall or an app to keep them safe. A lot of routers have plain HTTP login pages that only get exposed to the internal network but then have crappy default passwords on them, same with IoT devices, etc. There's also, I'm going to hit touch more on this later, but even if something isn't an HTTP server we think there are a few ways that you can use this technique to possibly hit it even if you couldn't before. And finally, we're really interested in actually just finding out what's out there. You've heard me assert a few times that this design pattern is common and I've seen it many times but I don't have numbers for you because no one has done that so quick overview of stuff that we think is interesting about hitting loop back. The reason this paradigm is kind of common other than it being very portable is that if you have something like I'm going to call out the Spotify app here where you want the user to be able to click a link in their browser and then have it open up in the Spotify application, you could use a custom protocol handler like IRC or something but this pops up a warning message for the user and it's like hey you're using this weird protocol are you sure this is okay? And users don't like that, they have to click again. So what if instead you just made an HTTP request to localhost with the command to open up that link. So this is a design pattern that people use sometimes to get around having the warning about custom protocol handlers. So you'll see it sometimes. Of course, if you do that the TLS, you can't do TLS with that. You cannot have the Spotify server have a valid TLS for Spotify.com otherwise you're going to have horrible impersonation problems. Post header validation it's just not something people really think about when they're rolling a quick and dirty HTTP server on localhost but it stops this bug. In terms of routers and IoT devices talked about this a little bit but there's actually kind of a nasty problem here because for those checking host headers isn't necessarily a desirable behavior from a user experience perspective because people like to be able to go to you know mycamera.whatever and get their IoT camera or get their router and so unless this is like completely determined ahead of time you're going to have host header problems there. Normally they also don't use TLS for sort of similar internal network reasons it's not clear what sort of you know DNS zone you would have the host name be valid for so yeah my quick example I've said a couple times is it's very easy to own routers using this technique. In terms of mitigation Mike West at Google has a document that talks about some ways to sort of make attacks like this less bad it's called CORS and RFC 1918. So it's worth looking at if you care about this sort of thing so finally fun with non-http protocols this is a little bit more out there than the other parts of the attack probably a higher work to reward ratio but there could be some really neat stuff so just because you can only speak HTTP doesn't mean you can't also be speaking something else if a service is listening on TCP and it's a text-based protocol and you have the ability to inject HTTP headers you can get commands or things that might look like commands to that service fairly early in the request body such that if it's somewhat tolerant of malformed things you can inject commands or do you know put some kind of valid command in this protocol binary protocols are going to be a lot harder I'm not going to say I can't do it I'm just going to say that it's probably going to be difficult survey I think this tweet twitter conversation kind of speaks for itself but we don't know what's out there if you have control of a windows fleet like if your organization has like a wide install base that has sort of a diverse menagerie of consumer applications installed on windows endpoints I would love to know what's listening on local servers but I don't have the ability to do that scan so instead I built a tool that at least lets you do point checks or we built a tool yes finishing up we believe in having our code up so it's all on github the only tricky part of the installation is setting up that IP tables hack the rest of it is just docker so you'll be pretty good to go and we're also going to be running for a while at least it generates a lot of DNS requests and stuff so like Amazon might kick us off eventually and also it might get expensive in which case we'll turn it off because you can run your own but if you want to reach out to us I'm dcooper at akamai.com Alan's aworth at akamai.com and I hope I see some of you all at the CTF