 All right. Hello, everyone. So I'm Filippo. This is George. We work on the Cloud for Cryptography team. Alex, who wrote most of the code, is around here somewhere. Now, we are here to talk about how to solve the Cloud for Captures. Now, to solve the Cloud for Captures, you click the boxes with the street signs. You have to be careful not to click the boxes with a pole at the base or any other sign that might be in the picture. All clear? Good. Then I'm done. Okay, but seriously, we're talking about solving the problem of the Cloud for Captures. And in fact, while Captures like that may seem pretty straightforward, they embed a lot of assumptions about who you are as a person, like what language you're familiar with, whether or not you can see or hear clearly if it's easy for you to type or click all those little squares or even stuff like where you grew up and how that affects your idea of what counts as a house or what a storefront might be. So there are a lot of reasons why a capture that seems straightforward might actually be pretty difficult for someone. So at Cloudflare, we try to use them only as a last resort. But what am I even talking about? Let's take a step back for a minute and answer a question that some of you might be asking, which is what's a Cloudflare? Cloudflare is a service that sits in front of websites on the Internet and provides sort of like value-added request routing. We try to protect our customers from like big obvious threats, like DDoS attacks, but also from smaller things like web scraping or comment spam. And we do this by inspecting signals that we extract from the requests that users send through us, looking at things like what resource they're requesting or what browser they're using, what country they're from, to try and make a determination about whether or not this is a real human using a website who we should obviously let through or if it's some kind of bot that's attacking one of our customers which we should obviously restrict. And generally, we don't have a problem with this. We can look at all of these traits of requests and make our decision pretty easily. But in one particularly notable case, we have almost none of the signals that we normally operate on to go on. And that is a Tor browser. And Tor browser, in addition to sending all of its users traffic through Tor, also takes a lot of additional security measures to protect the anonymity of its users, which means that all of the things we would normally look at are either deliberately obscured or outright denied to us. So when we still need to make a decision about requests that are coming from Tor browser, where everyone looks pretty much the same and is coming over Tor, we have to fall back to a lot broader and less precise sorts of signals. And in the case of Tor browser, the main one that matters is IP reputation. And again, because it's Tor browser, what I mean here really is the IP reputation of the Tor exit relays. And as you can imagine, they're not great. It's mostly because enough attackers do use Tor that are automated systems that watch for attacks and then, like, downrate the IPs that we see them coming from. Automatically assign bad ratings to most of the Tor exit relays most of the time. And when we have only IP reputation to go on, when we can only see that, like, oh, hey, this request is coming from one of those sketchy IPs, we have to assume that that traffic might be malicious and we serve a captcha to try and disambiguate. Unfortunately, this has side effects. And the one that we're really here to deal with today is that Tor users end up getting a lot of CloudFlare captures because we sit in front of so many different websites that it's actually pretty easy to imagine that someone using Tor browser might have to solve a captcha of ours for not just every website they visit in a given day, but also possibly several times over again for the same site. And it adds up to, you know, easily dozens and dozens of the street sign things every day. And let me tell you, Tor users love us for this. Those stickers are so popular, it took me four days of CCC to find any. But things actually aren't as bad as they were when the stickers were designed. As you can see from this extremely helpful scale-free chart, we're letting through more non-malicious Tor requests now than we ever have previously. But at the end of the day, what we care about is that yellow bar of traffic that still gets challenged, and some of it is actually users. They're real people who are just using Tor browser to try to browse the web privately. And while we do want to block the attacks that come through Tor, we don't actually want to block those people. In fact, we consider it to be an enormous problem, such an enormous problem that we've tried about half a dozen things internally to solve it, to make it so that this doesn't happen to Tor users. But none of these things have actually addressed the core issue, which is that not so much that you need to solve a CAPTCHA at all ever, but that you end up getting so many of them over the course of a Tor browsing session. So what we really need, what we would really like is some sort of portable proof of humaneness that a user of Tor browser can carry around with them and use to bypass the CAPTCHA challenges without de-anonymizing themselves and without compromising the security of the Cloudflare Edge. So we're having to meet a sort of union set of security requirements here where it's both the guarantees we make to our customers and the guarantees that Tor browser makes to its users to protect their anonymity. So we've actually been working on this for almost a year and a half now, and the crypto team at Cloudflare knew that we needed to do something about this, but we couldn't figure out what to tell about a year ago when we read a blog post in which Jan suggested that you could probably rate limit accounts in an anonymous manner using blind signatures, which, by the way, is why you should all be blogging your obvious crypto ideas, because we hadn't thought of this, but when we read this post, we said, well, that sounds cool. Why don't we build something like that? And what it ended up is being this. It's a plug-in for the Tor browser and a signing service that runs on our network edge such that a user can solve a capture and then, along with that solution, submit a whole bunch of blinded tokens and will validate the capture solution and then sign the blinded tokens and return them to the user, who can keep them stored in their browser and the next time they would hit a capture challenge, they can instead unblinded token and submit it as a validation that they don't have to do the street sign thing again. And this actually gets us a lot of the properties that we wanted. It means that the Tor browser user who was previously having to solve possibly dozens of captures per browsing session can instead now solve one, hold a bunch of tokens, and not see a capture again for the next many websites that they visit. And because of the blinding, we can do this without compromising Tor's anonymity guarantees and because they are tokens stored in a plug-in and not actually cookies, it means that they'll work across domains and over multiple Tor circuits which our current cookie-based solution just doesn't. But finding a thing that did all of this and still met all of our security requirements was pretty hard. There are not a lot of options out there. So where we're at with it is straight-up, boring, old, best-of-the-80s blind RSA. With a couple of design tweaks that make it more suitable for the HTTP use case. And we've specced this out and we have some code deployed that does it. I'll give you this link again in a minute. But in practice, we would really like to get away from this. We've looked at a bunch of other blind signature algorithms and even some full-on anonymous credentials. But in practice, none of them are really deployable. They're either a thing that has just never been implemented anywhere by anyone or they require some sort of esoteric primitive, like pairings, which makes it hard for us to implement it in JavaScript and deploy it in a browser. But aside from the algorithm itself, there are also some open questions about protocols of this kind in general, such as, well, okay, is this just a whole new de-anonymization vector? Or what do we do about botnets trying to stockpile tokens or a malicious website that wants to force a user of Tor browser to drain their own stash of tokens? And the question's like this about the proper algorithm and analysis of the protocol where all of you come in because the point of giving this introductory talk to a room full of cryptographers is that we really need help. If you have questions, we're here. If you have comments, there's a mailing list at torproject.org that you can send them to. If you're sitting here thinking, oh, no, that's total crap. There's no way it will ever work. Hang on, let me show you. The next pet deadline is in February. And this is all on GitHub, so if you prefer to just give us code or comments or pull requests or anything like that, we're there as well. And that's what I've got. Thank you. Is there a solution? Oh, my God. All right. So, aside from the obvious concern that there are just a bunch of gotchas in doing these large blending operations in JavaScript, which I think we actually do have reasonably good answers about, the main problem with that is that any policy that you want bound to these tokens needs to be bound to the public key that's used to sign them. If we want to have them expire, if we want to have them only valid for a certain subset of customers or anything like that, we have to do that with key management. And as you may have heard earlier, we don't have a lot of shared state going on in our edge. We'd like to avoid having to have shared state of every different tranche of RSA key that we can possibly think of a use for. So something that doesn't have this expectation of policy that's much better operationally for us. Would it be possible to add language selection to some of these things? I do Heusel cups in various TLDs and they usually end up having to figure out what the picture is asking me for in Swedish, which is not my language and it's a big problem for me anyway. So, for that, you'd have to talk to Google. Are your customers happy about this? Because I didn't mention a lot of your customers rely, business models rely on knowing as much as possible about their users and essentially you're now making it easier for users to access those services anonymously. That's a good question that I don't have a really 100% correct answer for. Mostly we already give people the ability to apply customer specific policy to tour traffic. So if you've gone ahead and done something like that, like if you really don't want anonymous users, you can already make life harder for them, but if you haven't done that, it probably means that you don't care and we shouldn't be making it harder by default. I was wondering what type of attacks are you trying to prevent from tour users that you will actually still prevent using this system because it can't be DDoS attacks, right? Tour is not suitable for DDoS attacks. So that's actually an interesting answer that I've got for this. When you actually look at the type of traffic that we see coming over tour that our customers don't want to receive, by volume it's really not like what you would think about exploits or lead Russian hackers. It's more like comment spam and high bandwidth web scraping that just has no concept of rate limiting and will take down your site without thinking about it. And those sorts of things do come over tour. And then there's this long tail of weirdness like SQLMap literally has a dash-dash tour flag that will just rattle of your traffic through tour while you're trying to do SQL injection attacks. So it's stuff like that. I wasn't aware that Cloudflare was in the business of scrubbing the content of the customers that they receive as well. I'm sorry, could you repeat? I wasn't aware that Cloudflare is also in the business of scrubbing the content that their customers receive. Content? Not really. Oh, the request. Yeah, so basically we don't check for content so much as like form that indicates it's coming from a Russian botnet or something like that. Yeah, we basically are just talking about as a WAF. We have a WAF. So early on when there was all of, I guess, this time last year, maybe even the year before, I think, you guys published some graphs about sort of the rate at which the tour exits were being blocked or whatever. And of course I didn't know anything about where I would publish the methodology behind them and things like this. But there was, just sort of guessing the methodology and backing out doing some little back of the envelope computations, it looked like you guys found, it looked like you guys were seeing two bad tour circuits on average sort of continuously. So it seemed a bit to me like the actual culprit, there were actually not very many culprits behind this, literally two on average. So unfortunately I really have no idea about that. Perhaps you do? Yeah. Is this on? Yeah. So I think those graphs were generated early on. The ones you've seen on these slides have used a different methodology. But more relevantly I think that even if it's just two on average at any time the nature of tour is that we have no way of identifying them. So again, if we deploy counters measures against those we end up hitting users also. So hence this talk. What I meant by that is there might have been did you guys try for example rotating very quickly? So for example the circuit is ten minutes. There are things like rotating very doing the detection and whatever metrics you're doing for detection having a very fast version. I guess you can't do that because you have has to run over a long period of time, right? Never mind. Is it going to be quick? Yes. Excellent. This is very innovative. Thank you. One of the bits of trying to be anonymous as I understand it is looking like everybody else who's using tour. If other companies that want to do something similar but you know a little bit different take your open source code and do the Facebook challenge bypass specification or even other CDNs might imagine approaching something like this it seems like you get a bit or so of de-anonymization per plugin like this that becomes popular in the world and not just for those who've installed it of course but for those who choose not to install it then they stand out as being weird in that way. So to be safe and to really meet that anonymization goal maybe I'm missing something everybody can use the one that you've gotten so there's one plugin and you get it upstream and you're good. If so, are you pursuing that? Short answer, yes. Longer answer is this is in no way cloudflare specific. It's the way the plugin and the tags on websites and stuff are architected with a list of public keys and if you have one of those pinned in your plugin or many of them even that's what decides which requests for tokens that you respond to. But which keys you have pinned is detectable so you still get a bit per key that's pinned. So to make this safe there has to be one master key and one entity has to hold it. I mean you know who I propose. Who do you propose? I propose. Talk to you later. Alright, why don't we take the rest let's take it offline. Thank you everyone.