 Hello DEF CON, thank you for tuning into my talk being broadcast from Raleigh, North Carolina. My name is Austin Allshaus, I'm a research scientist at BitSight and as part of my job I do a lot of surveys and studies of security best practices across the internet and today I'm going to walk you through some of the low-level details of how to do one study that I did recently involving compromising RSA keys through factorization. While this talk is nominally about how to compromise a specific subset of vulnerable RSA keys, what it's really functionally about is a scalable method to calculate shared factors across large batches of integers and that's because that is the mechanism by which we are going to do it. There's been a lot of past research on this topic and many of the researchers have sort of simply attested that they built a custom scalable distributed batch GCD implementation to factor keys collected from the internet but many of these studies have been fairly light on implementation details so this talk is going to walk you through what distributed batch GCD means and how to implement it yourself in order to break some RSA keys. I'm not going to give a whole RSA recap but there is one thing that you need to know in order to understand the content of this talk. The first step when producing an RSA key pair is to select two random prime numbers and the product of those two prime numbers is shared as the modulus of the public key. The security of RSA is dependent upon the fact that given a sufficiently large key size it is not tractable to factor that public modulus back into those constituent primes and the secrecy of those primes is critical to the security of the private key. But while large integer factorization is a computationally difficult problem fast and efficient methods do exist for calculating the greatest common divisor of two integers. This means that if any two RSA keys just happen to choose one of the same primes when generating keys both of those keys can be easily compromised by calculating the greatest common divisor of the two keys. In theory this should never happen as the number of potential primes to choose from is so mind-bogglingly large that it would never actually happen by chance. However, back almost a decade ago two research teams actually found out that many RSA certificates collected from the internet do in fact share primes with other certificates. This making them trivial to compromise and they were able to attribute this phenomenon to flawed implementations in pseudo random number generators ceding the key generation process. Over the years this phenomenon has been revisited with researchers collecting and evaluating larger and larger batches of keys necessitating various big data approaches to this problem. This culminated somewhat in a really interesting talk back at DEF CON 26 in which some folks from Kudeski security really industrialized the key acquisition process and evaluated hundreds of millions of keys for a variety of weak implementations but including this shared prime factor vulnerability that I'm discussing today. So the question really sort of boils down to if some RSA keys do share primes and they can be compromised by finding share factors across them how do you calculate the greatest common divisor across hundreds of millions of keys. To answer that question we need to go back over 2,000 years to what is one of the oldest known algorithms the Euclidean algorithm which is used to calculate the greatest common divisor of two numbers. It works by recursively calculating remainders between two numbers until the greatest common divisor of those two numbers is reached which it may just be one if the two numbers don't actually share any common factors. In this trivially small example on the slide comprised of four products of prime numbers by calculating the pairwise greatest common divisor of each combination of numbers we can discover that one pair does in fact share a common factor of seven because it has a greatest common divisor greater than one. While this slide is using small integers just for illustrative purposes these integers could just as easily be real RSA moduli and this is a perfectly valid way to compromise keys if there does happen to be a shared prime factor within a small batch of keys. The Euclidean algorithm is fast and efficient but because you have to do these pairwise combinations attempting to calculate the greatest common divisor across hundreds of millions of keys could potentially require hundreds of quadrillions of iterations of this algorithm which means it is really just not scalable to that problem. So skipping ahead over 2,000 years again a cryptographer named Bernstein published an efficient method for calculating the greatest common divisor across batches of numbers. Like many problems in computer science it uses an intermediate tree data structure to bypass the requirement of having to calculate every pairwise combination of numbers. In simplest terms Bernstein's method builds a product tree by calculating the products of pairs of numbers in the batch and then repeating this process up successive levels of the tree until the root of the tree represents the cumulative product of all numbers in the batch. It then decomposes this product tree back into remainders by calculating the remainder of each parent node with respect to the square of its child node until the leaves of the trees represent the remainders of each integer in the batch with respect to the cumulative product of the whole batch. A final greatest common divisor step is computed on each leaf remainder which will reveal if that particular integer shares a factor with any other integer in the batch. This is a very similar approach to the Euclidean algorithm with the key distinction being that the shared factors are being discovered with respect to the cumulative product of the whole batch instead of the various pairwise combinations of all integers in the batch. This is a very effective approach specifically for the RSA key factorization problem because in general shared factors are relatively rare and therefore it's very likely that any factor output by this method will be one of the actual primes used in key generation and it's less likely to be some sort of composite value representing multiple shared factors in the batch. So I understand this may be very difficult to visualize just from a verbal description so I'm going to walk you through an actual explicit example. In this example we're using the same prime products as before which contain two products with a shared factor of seven. Building the product tree is merely a process of pairing off the integers and calculating their products at each level until we get the cumulative product of the batch represented in green. After that product tree has been formed the remainder of each parent node is calculated with respect to the square of its child node. When the bottom of the tree is reached the greatest common divisor is calculated between the resulting remainder and the modulus and if this value is not one it means it shares a factor with some other modulus in the batch. In this same example the two shared factors of seven are output just the same as using the pairwise Euclidean implementation described earlier. While this implementation is very fast it does raise a new challenge and that these product trees can potentially get very very large. Such a tree of 150 million 2048 bit RSA moduli would be over a terabyte in size which can be very difficult to manage especially on a single machine. So say you don't have a machine with a terabyte of memory. There's actually a pretty straightforward way to make this calculation much more manageable. Instead of making one very large product tree you can make a few smaller ones. Breaking that 150 million batch into five smaller batches will produce product trees that are roughly 180 gigabytes in size which can be quite a bit more manageable and potentially processable on just a single machine. There is a major downside to breaking the tree up into smaller batches however and that is in order to get coverage of the shared factors across these different batches the remainder trees must be calculated with respect to each other tree which requires this permutation step of all the trees. While this is less efficient in practice it could actually be faster because all the arithmetic is being done on much smaller numbers and there's no bottleneck where we're trying to calculate arithmetic on really huge integers at the root of a massive monolithic product tree. So to walk you through another explicit example here we have two batches of prime products. The first batch is the same one as we had before that shares a prime factor of seven. The second batch has a shared factor of 23 and then across the two batches there is a shared factor of 17. When calculating the remainders for each tree against the cumulative product of both trees all of these shared factors end up falling out at the bottom and the permutation of these trees is really important because otherwise that 17 factor which is shared across both batches would not have been discovered if we were only evaluating the trees within each batch. By calculating the product trees and then permuting the remainder trees in this way the calculation of shared factors across a huge number of integers can be broken down into batches and parallelized across any number of machines without any necessary outrageous memory requirements. The sizes of the trees and the number of the batches can really be tailored to the compute and memory resources available. Here's an example architecture that I used to factor 86 million rsa keys using just commodity hardware and no specialized software. The factorization code was all written in go and all that really is is implementing this product tree and remainder tree logic that was covered earlier. The arithmetic was calculated using the native c canoe multi-precision library since allegedly it is quite a bit more performant minutes go counterpart. RSA module i were read from s3 and the product tree levels were stored to EBS and they were just serialized using go's native built-in gobb library. Concurrent C and calculating the various values at each tree level was done just using go routines and orchestration of all the tree permutations was just a simple shell script. No specialized software, no big data frameworks needed. So I used that architecture to factor about 86 million keys from certificates collected from the internet over about a three month period and found some interesting results. Less than 50,000 of those keys were able to be compromised due to sharing a prime factor. This is a much lower number than I was expecting and also a much lower number that had been reported in prior years. As a sanity check I went back and collected samples of keys dating back six years and discovered that sure enough the prevalence of this type of vulnerable key has decreased dramatically over the years. The chart on the slide represents the number of keys that could be factored from a random sample of 100 million keys collected in a given year based on sharing a prime factor with some other key in that same 10 million key sample. I think the dramatic decline here observed here is really a testament to the impact of prior research as it appears to have made vendors address this problem as it is far less prevalent today than it was just a couple years earlier. Out of the keys that were still vulnerable, they almost exclusively appeared in networking devices and embedded systems but I think the question still remains if this issue has largely been remediated and if it is trending downward pretty dramatically why are there still so many vulnerable keys on the internet? While reviewing the certificate validity lifetimes of the vulnerable keys provides some insight into that. The charts here show a histogram of validity dates of vulnerable certificates compared to a random sample of certificates from the internet. The long tail of the not valid before dates hint that perhaps the certificates are just really old devices and the not valid after dates on the right show that over 10 percent had expired over a decade ago and that helps really reinforce this fact. Many of the vulnerable certificates likely just represent really old networking equipment that is probably lost in a closet somewhere and it's still connected to the internet and the owners and operators just don't even realize that it's there and online. Out of the roughly 150 million total keys analyzed only a single vulnerable key was signed by a trusted third party certificate authority. Every single other vulnerable key was either self-signed or signed by an internal CA that's not publicly trusted. This suggests that the likely culprit of many of these keys are devices that are automatically generating certificates and this is somewhat reinforced by the absurdly long validity durations present in the second chart on the right and this is done likely as a convenience by vendors and of course this is not in line with best practices for certificates. When reviewing the organizations that were actually hosting these vulnerable devices the trends are generally what you would expect. Organizations and industries that typically have very mature security programs that invest a lot in security industries such as financial services were the least likely to be hosting vulnerable devices and industries that are more notorious for perhaps lax practices such as utilities were more than 10 times as likely to be hosting a vulnerable device than their low-risk industry counterparts such as financial services insurance and legal. Finally I saved this chart for last because I think it is really the most important chart of the whole presentation. This chart represents the relationships between vulnerable prime values where an edge exists between two primes if they appeared in the same RSA modulus. The coloring represents the product families that the primes appeared in. For example one color is Huawei switches one is d-link routers etc etc it becomes very obvious that the relationships between these primes are mostly disjoint between different products. This is really important because if they are disjoint attempting to find shared factors across product families is somewhat of a fruitless exercise as the vendor specific random number generation flaws appear to only create prime collisions within their given product family. The implication of this is that you don't necessarily need a big data approach of collecting some huge corpus of keys to compromise keys via this method. Small collections of keys specific to a given networking device or embedded systems product are is likely to be vulnerable as a massive collection of random keys that have been harvested from the internet. Much of the analysis on these flawed keys in the past has focused on keys collected from the public facing internet but I think this chart really shows that there may be opportunities to find additional vulnerable products in devices that are not typically exposed directly from the internet for anyone who's able to make product targeted key collections behind an external firewall of a large organization or perhaps across many smaller organizations. So to wrap things up almost a decade after the discovery of this phenomenon of the prevalence of shared primes and certificates on the internet there's still a fairly large number of devices that are factorable due to these shared primes. However this seems to be primarily the result of really old devices and not necessarily from new vulnerable products. The culprits here seem to be primarily automatically generated certificates from networking equipment so you know maybe don't trust those certificates. And finally you don't really need specialized software or a massive corpus of keys in order to compromise keys in this way. If you can get small targeted collections of keys from specific networking equipment or embedded systems products that can potentially yield results. And to end things up I published a reference implementation of the distributable batch gcd method described in this talk at the link on the slide it'll demonstrate the successful factorization of a small batch of actual rsa moduli. This implementation is really just for illustrative purposes it was written in python in order to be very simple clear and concise but as a result it's also very very slow it will not scale. If you want to do this on larger batches of moduli I highly recommend that you translate the code into your favorite compiled language of choice. And I will close things with that thank you for tuning in and I hope you enjoy the rest of DEF CON 29.