 Hi, everyone. My name is Niels. I'm from Switzerland. I came today with my colleague Yolan. We are in the research team at Kudelsky Security. I like to fish and I'm really into video games. And we are going to talk about collecting public keys and breaking them at scale. So, who has a public key on GitHub, GitLab, or has pushed a PGP key on a key server? Raise your hand. Yeah, SSH key, PGP key, everything. Okay, so a lot of people. So, what if I told you those who raised your hand, we have your key? So far so good. Nothing to worry about, right? So, now, what if I told you we could actually break those keys and retrieve the private key out of your public key? Sounds more fun, right? And so, if we can do it, guess who can do it too? Like NSA. That's easy peasy for them, right? So, one question remains, how do you retrieve the private key out of the public key, Yolan? Well, that's a really good question, Nels. So, I'm a math guy. So, I'll do a little, you know, crypto recap first. So, most keys out there are using the RSA crypto systems. So, RSA is really simple. You just need to take two large prime numbers. Okay, that slide is a bit dry. It won't do it. Two large prime numbers. Much easier. So, you take P and Q prime, really large, 2,000 or more bits, id only. You multiply them together. Awesome. You get N. That is your public modulus. It's your public key. And the security of RSA is all based on the fact that you cannot easily factor N into P and Q. It's really hard to factor large numbers. And so far so good. So, you got your private key P and Q and you published the N outside on GitHub, PGP, whatever. But what happens if somebody else generated another public key M using the same prime Q as you did? So, let's say he has a key Q times R equal to M. You can actually really easily find the greatest common divisor of both N and M. That's everybody learned. That's called Euclid algorithm. And that's easy peasy. So, that's really bad for RSA. Now you won't tell me, ah, RSA is no good. There are other fun crypto system out there like, you know, elliptic curve cryptography. And elliptic curve cryptography does not rely on the hardness of the factoring of large integers. It relies instead of the hardness of solving the discrete logarithm problem. And it is not impacted by having a lot of keys, unlike for RSA, where you can just try to batch factor all the public keys out there. And so elliptic curve cryptography are more secure outside than RSA against large scale attacks like these we tried, but we'll also be talking a bit about ECC at the end of our talk. So, what kind of attacks can we actually do against public keys? We want to remain silent. You know, we don't want to take a branch and go hit the guy to ask him for his password. He will not die. He will remark, you know, he will why do you hit me? So, we can try to use the return of the coppersmith attack, which was like, last year, smart curves were generating bad primes. And you could easily, well, to easily factor their produced integers. So, another problem you can have is invalid parameters in DSA, for example, or if you take a too small key size for your ECC or DSA or even for RSA, if you take a red bits prime, it's no good. You can easily factor those invalid curves attack or those same, but we'll see later it's not the same actually. So, in the end, the one we're really talking about today is RSA models factorization at scale. So, it means we are doing batch GCD. Bad GCD has actually already been done in the past by academics, but it was done at academic scale. So, on that assess of maybe 10 million keys, 20 million keys, 80 million keys, we did it at scale, at a much larger scale. These are all known attacks. Anybody can do it. And they are completely perceived. You won't even know if you are vulnerable to these. So, Nels, how do we collect these keys? So, keys are, public keys are everywhere. And we have focused on the three most common key container types and namely certificates, so X519 certificates you can find in HTTPS, for instance, SSH keys and PGP keys. And we have found quite a few interesting results and also a few crazy things that we will talk about later in this talk. And for example, as you know, certificates have a validity period. So, it's supposed to start being valued at some point and expire at a later date. And some certificates actually have a negative validity period. I mean, why would you do that? Most keys we have come from certs. About 70% of our data set came from certificates. So, that's about 240 million keys. We also have 90 million SSH keys and about 10 million PGP keys in our database. Those certs mostly come from HTTPS scans that we perform using a custom tool that I will talk about later. And SSH keys mostly came from a study that was made by Crocs. Those guys generated a lot of SSH keys with multiple software libraries and smart cards. And they tried to see if there were any vulnerabilities in those libs. We were able to validate their results. Namely, we found just as they did that one of the smart card model was generating keys with common factors. So, that was really bad. And also PGP keys, most of those keys came from the pool of SKS key servers. So, why is it interesting to actually have as many public keys as we can? So, when you run batch GCD and RSA keys, you are looking for common factors, right? So, the more keys you have, the more chances there are of having two keys with a common factor in the dataset. So, this is why it's good to have as many as possible. And we currently have over 340 million keys. This is still growing. We are still ingesting keys from certificate transparency. So, this is a project that was initiated by Google. And it's basically a pool of log servers where you can get certificates. So, we can grab the keys from there. And also, from those log servers, we were able to collect a list of 270 million domain names and subdomains. You can do a lot of stuff with that. Yeah, it's way more than the next set up one million. Oh, yeah. So, we found that today RSA is still the most common key type. 95% of keys are still RSA. So, Roka attack and batch GCD is targeting that. So, be careful. EC keys are getting more popular. And this is basically what I guess you should start using today. And then we still have some DSA keys. I mean, who uses that anyway today? And a few other less well known. A few words about tools. So, we've been using a tool that was developed by one of our colleagues. It's called scanner. So, it's not just a network scanner. It's also a fingerprinting engine. It's written in Erlang. It's open source. It's on GitHub. You can download it today. Our parsers are mostly written in Python. We also have some golden code and some bash scripts in there. The code will be open source on GitHub. It is available today. We have the link at the end of this presentation. And we also have been using Apache in iFi to define some data pipelines that we use to ingest the keys in our database. That database is stored in HDFS. And we use Presto to run SQL queries on the dataset. Then for breaking keys. So, now that we have them in our database, we actually wrote a tool that is written in chapel to compute the batch DCD of all of those keys. So, basically computing the GCD between every pair of RSA modulus in there. And this is a distributed implementation. So, it basically scales. You can throw more machines at it. It's constant in memory. If you just need it to be faster, just add more machines. We also check for the ROCA attack, the ROCA phone. And we run some checks on easy keys like invalid parameters, key lengths, and so on. So, we have a short demo for you. So, here we have two SSH public keys, aka1.pub, aka2.pub. We run a script that extracts the RSA modulus from the keys, compute the GCD. And since the GCD is larger than one, we can retrieve the private key. Then we just copy paste that key to a file. Push the original public key to a test machine just for testing if we can log in with via SSH to it. Now that the key is on there, we can try to log in via SSH using the private key we just reconstructed. And guess what? We're in. Boom, that machine has been pwned. So, here what we did is we just computed the GCD between two keys, for the example. But what we actually did is the GCD between all key pairs. And we have made a website that you can go today. It's on keylookup.cureskeysecurity.com. You can upload your key and get it tested against our own data set. We promise we'll tell you if we find your private key. Do not push the private key, we will find it. So basically it looks like this, you just have a forum where you can copy paste an SSH key, a cert or a PGP key. You hit submit. And what you may see is that either the key is already in our database, so we can give you an answer immediately or we will add it to the processing queue. And you can just recheck later and it will tell you whether it is vulnerable or not within about an hour. So, how does that work behind the scenes? Yeah, you know, we have a corporate policy saying no Bitcoin mining, so we have to find something to do. Yeah, so we had to use that 280 vCPU cluster and make it to good use. So we also have about 2 terabytes of storage required because we have to have some intermediate computations for the batch GCD so that we can reuse the calculations and do not recompute everything when we just want to test new keys against the full data set. So we just recompute whatever changes and we don't have to recompute everything. And if we test just like one more key, it would take maybe 10 to 20 minutes. 30 keys at once is about an hour. And if you want to check a large batch of keys, like 5 million keys, it can take up to 24 hours. That data is stored in HDFS. We have a 10 data node cluster. We use partitioned Presto tables to have fast lookups. And the scanner is deployed on 50 machines, so we scan from 50 different IP addresses at once. That's a really nice scanning infrastructure. So Jolan, what good results and cool stuff have we found? Well, we've broken a few keys, like, you know, 210,000 keys. Most of them were certificates, but some of them are actually in use today. So we could perform manage the middle attacks against these websites really easily and passively. 3000 SSH keys, most of them host keys, so it will mean it's allowing for manage the middle attacks again. But if you are using a SSH key on your github page and we broke it, maybe you're using the same SSH key on your backs, you know, you shouldn't do that. PGP keys, that's bad because it means we can decrypt what we wrote. We can sign message instead of few and so on. So that's really bad. What's fun with PGP keys is that a lot of the keys we've broke were actually having more than two factors. So that's actually saving your day if your key is not just using two primes, but three primes. And it's really strange because it's not something that common in implementations today to have three prime numbers. So Rock Attack, we actually found PGP keys vulnerable to Rock Attack on keybase, github, githlab, everywhere. So double check your keys, you can check them using Python scripts offline on your computer or using our keylookup websites. Racka is not that bad because if your key is in the two-third that day for a thousand bits, you're still safe. It will take way too long to actually compute the primes out of the public key. But if you're on the weak side, like, you know, 2000 bits, it's feasible. You know, routers, they have bad randomness, we say. Well, they still do. A lot of routers we've seen since 2017 have common factors. So some of them have even broken, you know, comments in their certificates, like who is using .com or in 2015, according to our historical scan data, a lot of daily routers were having problems. And that's fun because previous 2D done in 2016 about bad CCD found a problem and they got it patched. That's nice. Few certificates are not valid before 2040. Wait, I'll maybe have a quantum computer by then. Why will you do that? Bad idea. ECT keys on the other hand are getting more and more used in practice, which is good. Certificates on PGP mostly are using it more and more. On SSH, the most used curve is still a SAC, yeah, a NIST curve, which is not bad per se. But curve 25519 is growing. It's in third position right now, but maybe next year it will be second. It still has a bit to go before being first. So more and more. ECT keys, I said. So when we scan a RSA keys, new keys were always in the same amount. We found always the same amount of new keys. But for ECC, we are finding more and more ECC keys and that's really good. And actually, let's encrypt will soon be able to issue a full ECC-based certificate chain. So it's cool. And now, what if I told you that some people are using their SSH key to send encrypted PGP mails? That sounds strange. People do. We found keys that are reused as both PGP on SSH or as both SSH and X509 certificates, which is completely insane. It doesn't make sense. You should just generate new keys, you know. Most people using PGP have only one sub key. Why? Just multiple, you know. So if one of the sub keys is broken, you can still have another one to save your house. Some of the keys we found were at more than two factors. Most of them were PGP, I already told you. And DSA is dead. OpenSSL, the break, it did in 2015, three years ago. And nowadays, only like 3,000 certificates were still using DSA as a signature algorithm. And less than one percent of SSH key were DSA based. So that's really good. And if you're still using DSA, stop. So DSA, some people were using 256 bits keys, which doesn't make sense. So yeah, just stop using DSA. So in the end, mind your keys, because anybody can do these same kind of silent attacks. And maybe they already do. Thank you for your attention today. If you have questions, we'll be outside or you can hit on Twitter or so. And thank you.