 Hi, I'm Daniel from Akamai and today I'm going to be talking about cryptographic security for NTP. So for starters, NTP, who cares? Well I'd like to thank the anonymous reviewer who recommended that this talk be accepted and succinctly hit the nail on the head with an answer to that question. NTP itself is not a high profile network protocol but it does underpin pretty much everything else and that makes it an important target for scrutiny. What is everything else? Well, a lot. Take this list with a grain of salt. I came up with a few of these things off the top of my head. A lot more of them come from grepping the corpus of RFCs for the word timestamp. So there's probably a lot I left out. Apparently we can now add DNS to that list. But this stuff is mostly boring. X509 is mostly just used for TLS and who cares about that. So let's talk about something interesting instead. Let's talk about sharks. So a few years ago there was a biology grad student at UCSD who was tracking the migrations of a school of white sharks. His team had tagged the sharks with GPS trackers which were periodically sending timestamp location data to a server in their lab. Unfortunately that server's NTP deem was misbehaving and as a result its clock was fast. Now as far as anybody knows this was not the result of an attack. Maybe it was a bug, maybe it was misconfiguration, maybe there was a problem with the upstream servers. I don't think anybody ever got to the bottom of that. But anyway, as a result of its clock being fast the tracking program thought that the data it was receiving was old. So it applied dead reckoning to infer the shark's current position. This place the sharks jumped them if you will just offshore of a popular beach. The grad student sees this and leaping into action phones up his friend to lifeguard. Mass panic predictably ensues. The beach gets evacuated and shut down until some time later when a very embarrassed grad student realizes his mistake. So the moral of this story is that when you think of NTP don't just think of your common desktop applications. Don't just think of your TLS server. Don't just think of high frequency trading. There are more things that depend on NTP than are dreamt of in your philosophy. And when it breaks it can break in some very surprising ways. So having established the importance of the integrity of our system clock it follows that we'd like our NTP data to be authenticated. And here's what's available for that today. So the overwhelming majority of NTP users are running it without any authentication at all and the few of you in this audience who are an exception you know who you are. NTP does support a symmetric authentication mode which basically just hacks on a Mac on the end of every packet. It's somewhat broken but the more important issue is that being symmetric it's only useful to you if you've arranged in advance to share a key with your server operator and obviously this doesn't scale the real world deployment. Finally NTP supports something called Auto Key which is an attempt at supporting public authentication. But as I will return to in a bit it is thoroughly broken. To quote Harlan Sten who maintains the NTPD reference implementation if you're using Auto Key you should stop. Now before I venture any further into the weeds of NTP authentication schemes I'm going to need to explain a bit about how NTP works. This is a topic which could easily fill a college semester but here is a super abridged version with just enough information so that you can follow the rest of this talk. So here on the slide you see an NTP packet header. This header can be followed by some extension fields and then by a Mac but in most usage not the one is present so this header is the entire packet. The four fields that are interesting for our purposes are highlighted in yellow. NTP's basic time synchronization algorithm involves four time stamps which I've laid out here. There's also a fifth one the reference time stamp but for today's purposes you can ignore it. But these four time stamps represent a chronological sequence of events. Request is sent, request is received, response is sent, response is received. Note that only T1 through T3 appear in the packet. T4 never crosses the wire. It's just measured by the client when the response arrives. So given these four time stamps there are a few important values we can compute. One is the network round trip time. T4 minus T1 represents the whole time the request is sent to the client's flight. T3 minus T2 represents the time the server spent processing it. So subtract that part out and you're left with just the time due to network latency. The statistic we really care about though here is theta which represents the offset between the server's clock and the client's clock. We can estimate theta as I've shown here but this equation incorporates a key assumption which is that network latency is symmetrical. In other words that our request and our response each took the same amount of time to cross the network. Of course no matter how good our authentication scheme is it's quite possible for a network adversary to mess with this assumption by delaying packets in one direction but not in the other. So here we have this lambda statistic which is also called the synchronization distance which basically represents our maximum error. It represents how far off our estimate might get as a result of this. So the first term here delta over 2 represents the worst case of network asymmetry where one leg is instantaneous and the other leg occurs on the other leg. Epsilon captures a few relatively minor sources of error. One thing that goes into Epsilon is that our clock get time call takes a nonzero amount of time to return so every time stamp has some inherent imprecision in it. Second is that our local clock might drift a little bit while the request is in flight so our measurement of round trip time might itself be a little off but in practice the delta over 2 term dominates this equation even if you're running over a fast land and Epsilon is pretty insignificant. So moving on to the mode field the overwhelming majority of NTP operates in client server mode. The client will send a mode 3 request and the server will send back a mode 4 response. But NTP also supports a symmetric mode and a broadcast mode. In symmetric mode two systems synchronize to each other. In this case you can either have both systems explicitly configured to talk to each other in which case they'll both send each other mode 1 packets. Or else one system can be configured to automatically peer with any system which sends it a well authenticated mode 1 packet. In which case the packets it sends back will be in mode 2. In broadcast mode the client will start out by performing some small number of normal client server volleys with the broadcast server in order to establish what the network latency is. But thereafter just passively listen for time broadcasts without sending out any further requests. I am not a fan of this mode and I don't really know anybody who uses it. Mode 6 and 7 are used for status queries. They use a completely different packet format than I've shown you here. Mode 6 is quasi standard, mode 7 is non-standard. Mode 7 is what's largely responsible for the massive DDoS amplification attacks you're probably familiar with. So as I said earlier symmetric authentication has some problems. First is that the MAC function is md5 of key append message. Given the currently supported vocabulary of extension fields that we implement there's nothing particularly interesting you can do with length of extension attacks. So this isn't completely the end of the world but it's certainly not good. The case around replay protection is dubious. The basic mechanism for replay protection is the origin time stamp. The client sends out T1 in the transmit time stamp field of its request. The server copies that value back into the origin time stamp field of its response and the client verifies that the response matches the request. But if the client's clock is fast and has to be stepped backwards then there's a danger that an origin time stamp will be reused. Same goes if multiple clients are configured to use the same key which among the few deployments that actually use authentication at all is fairly common because administrators are lazy. It's possible to use random values in place of real time stamps if you want since all we're doing with this is just checking for a match. But since they're only 64 bits long the birthday bound there gets uncomfortable. Also if you're operating in symmetric mode then the initial request of a session always has an origin time stamp of zero and in broadcast mode the origin time stamp is always zero. So these modes are more severely vulnerable to replay compared to client server mode. The same MAC key gets used in both directions. In the client server topology this is okay because client packets are clearly distinguished from server packets by difference in their mode field. But as I mentioned in symmetric configurations you may have mode one packets going in both directions which means the packet sent in one direction can then be replayed in the other direction. Also symmetric passive servers can be flooded with replayed packets from a bunch of different spoofed IPs which will cause them to stand up a bunch of spurious associations. Last but not least NTP has a history of disastrous bugs in its authentication code. Worst is that until early 2015 any packet which contained a MAC would be accepted as authentic. The code was verifying the MAC in setting an error flag if the MAC was invalid but then nothing was ever actually checking the status of that flag. And if you think that one's bad. Now I'm going to be as gentle as I can with Auto-Key because this was designed in an era where I'm not going to say nobody knew better but nobody knew very much. This was a few years before anybody noticed that Needham Shorter was broken. It well predates SSL. But with that proviso when Harlan said stop using it he was giving sound advice. So at this point a lot of you are probably wondering why NTP has to be such a snowflake. Why can't we just tunnel it through DTLS and call it a day? Well first let's address a few specious answers to that question which come up often. First there's the issue I mentioned earlier about how an adversary can influence time estimates by delaying packets. But that's not a problem that some protocol other than DTLS is actually going to solve. The best answer here is to just continue computing error bounds that Lambda statistic that I mentioned at the application layer just like we do it today. Another specious objection is that before we have correct time we won't be able to tell if an X5 of a 9 certificate offered by the server has expired. But again there is no magic solution here that we'll be able to achieve by avoiding DTLS. The best we can do is to reject any time response which would place our clock outside the... which would tell us that the current time is something that is outside the validity window on the certificate the server gave us. So this won't prevent a server from handing us an expired certificate and then also serving us old time. But that's a threat we can mitigate against just by querying multiple servers and then rejecting any outliers which is something that NTP already supports and has for a long time. The objection is that DTLS can't deal with broadcast mode which is true enough but I have a really simple answer to that one which is don't use broadcast mode. There's just no good motivation for it. But finally we come to an objection that can't be dismissed so easily which is that it's really important to allow NTP servers to be stateless and let's talk about why this is. NTP scales a lot differently than say a web server does. Any given client generates very little traffic not even four packets per hour. But NTP servers typically serve a massive number of clients. We may be talking tens of millions in some cases like the NIST servers which means that if we need to hold even a small amount of state for every client that has a session open we're going to come under memory pressure long before we come under CPU pressure. So with all that said let's talk about what we're doing to fix this mess. So network time security is an IETF effort aimed at replacing auto key. It's been in progress for several years. Dieter and Kristoff and Steven were the original authors. I came into the picture about six months ago with what was originally a competing proposal but as of the last IETF summit we've now joined forces and we have a joint draft in progress. Now one thing that both NTS author teams figured out pretty early on is that there's no single approach that's going to be workable for all of NTP's different operating modes. The biggest friction point is the conflict between the need for statelessness in some modes and the need for mutual replay protection in others. Now obviously you can't have both of these things at the same time since if you aren't willing to update your state in response to receiving an acceptable packet then if that packet comes in a second time then you'll be in the same state as before which means you're going to accept that packet a second time as well. Now fortunately no NTP mode requires us to do the impossible and provide both these things simultaneously but nonetheless we do need to do different things for different modes. Symmetric and control modes are relatively easy. We do need mutual replay protection. There isn't enough fan out for us to care about statelessness so plain old DTLS pretty much suits our needs perfectly. The spec for how to do this is literally a couple of paragraphs. But for client server mode which is what virtually everybody actually uses we need to do a little bit more work. Now we can still do an unmodified TLS handshake. This will require the server to hold state for one round trip while the handshake is in progress but as long as the state can be discarded immediately afterwards and not have to be held for the entire duration of the client session which can be days or weeks or months then that's acceptable. In that case even an implausibly busy server is going to have at most a couple of megabytes of state related handshakes. But using DTLS through application data in this case isn't going to cut it. So here's our solution for client server mode. We do a TLS handshake on a separate port. After that handshake we exchange one valley of TLS application data to negotiate an AEAD algorithm and to send the client a supply of cookies. Now the choice of AEAD algorithms is a little bit delicate here because in some deployments particularly ones that involve load balanced clusters or VM images that might get snapshot and restored it can be very difficult to avoid accidental nonce reuse. So the mandatory to implement algorithm we chose for the internet graph is AESSIV CMAC which gives us resistance to accidental reuse. Also the CFRG is currently working on a new proposal called AESG CMSIV which provides the same sort of misuse resistance properties but also provides much better performance and at such time as that graph becomes an RFC we'll start encouraging implementers to negotiate that instead. The cookies function a lot like TLS session tickets. They contain some information which the client stores on the server's behalf and sends back later so the server can resume the session without the server having to retain anything in the interim. Also just like TLS session tickets there's a privacy problem here. If the client sends back the cookie multiple times then that will enable a passive address area to track the client as it moves between networks. That's undesirable so we avoid it by having the server provide the client with a fresh cookie for every request. The cookie gets sent to the client encrypted but then sent back in the clear so the adversary won't be able to link the two legs of the transaction. This is perfectly analogous to how TLS 1.3 deals with session tickets. But supporting these privacy goals in NTS is a little pointless if they're being violated at the NTP layer and as for that well I'll let this slide speak for itself. Since Professor Goldberg and her student Anshel Malhotra are both here I'd like to thank them both respectively first for pushing us to get this work done and to Anshel for being my co-author on this draft. I'm going to be really happy when we get this nonsense cleaned up. Let me wrap up this talk with a quick shout out for Rough Time which is a separate effort that Adam Langley has been spearheading. Now Rough Time is not NTP at all it's a completely different protocol with different goals and as its name implies high precision is not one of those goals. What it provides in exchange is better ability to ensure that servers are behaving correctly. Now all along NTP has supported the ability to weed out servers that are serving bad time so-called false techers from any majority clique of so-called true chimers but the limitation of NTP whether you're securing it with NTS or with any of the legacy authentication mechanisms is that conversation transcripts are reputable so the client won't be able to prove to a third party that the false tickers are false ticking and that's what Rough Time solves and if we can ever get to the point where the usual behavior for clients is that they first set their time using Rough Time and then do their fine tuning using NTS authenticated NTP that would be a very nice world and I hope we get there in the next few years. So with that I'll wrap it up and take questions. Right, so we do have time for a couple of questions you have? I just wanted to acknowledge other people that have been working on this design that weren't mentioned on the slides so there's a whole bunch of people actually working from the ITF so I just wanted to mention some of their names who've been doing some of this work so there's Harlan Sten, there's Danny Mayer I just want to make sure I get everyone Kyle Rose, me, Anshil Manhotra and I hope I didn't forget anyone else so this is actually a big team and all of this stuff is happening now so it would be very, very exciting to get input from this community on this work so I'm really glad that Daniel got the chance to present this work here. Thank you. As Professor Goldberg mentioned there's actually quite a cast who's been involved with this a little bit here so I can't come up with all of them off the top of my head either look for the contributors section at the end of the draft. You were mentioning that this is connected to DNSSEC one thing I'm observing is that a lot of people seem to sign their DNSSEC records for a year in some cases I find this somewhat surprising just a couple of days ago I noticed state.gov is issuing one-year DNSSEC signatures it seems a little long for most purposes so don't do that. Thank you. Is it really too slow to just sign the responses? Yes. Is this something that the NTP pool can use or is it going to be administratively prohibitive for them to use it? So that's something that we're still in the very early phases of addressing by early phases I mean I sent one email about that out to the working group list and I haven't heard any responses yet but yeah having this supported by pool servers is going to be a little bit more challenging than if you've specifically configured what servers you want to talk to because it will probably require some means of having somebody who owns a CA private key who's that you trust and having them basically accredited people that you trust to serve you valid time. So this is again something that I hope rough time can help us with because that will help us spot people who's accreditation needs to go away but yeah this is very much an open problem. Great. All right so let's thank Daniel again.