 This lecture is part of Berkeley Math 115, an introductory undergraduate course on number theory, and it will be about factorization. So the problem is we're given a number n, and we know or suspect for some reason that it's not a prime, so we want to factorize it as a product of two numbers x and y, and the problem is to find x and y explicitly. And in practice, n might be really large, it might say 100 digits or something like that. So what we could do is we could just try testing to see whether m is divisible by various primes up to about the square root of n. And this works just fine if n is reasonably small. The problem is once n starts being large, this becomes prohibitively slow. If n has about 100 digits, for example, it would take more than the age of the universe to factor n like this. So we need a better method. And we mentioned another method earlier, which is to try and write n equals a squared minus b squared. This is an old method sometimes used by Fermat and other people a few hundred years ago. And this sometimes works quite well if n is around 1,000 or 10,000 or something. And the trouble is for it to work well, it kind of depends on the factors x and y of n being fairly close to each other, so that b is reasonably small. And it doesn't work all that well if n is a product of two factors, one of which is much bigger than the other. And there are actually ways of speeding up this method using sieving techniques, which are sometimes used, but it doesn't seem to be a particularly fast method as it is. So what I'm going to do is I'm going to discuss a couple more methods for factorising n. These were found by the mathematician John Pollard. In fact, he seems to have had a rather interesting career, or at least a rather unusual one. He got a PhD without actually having a PhD advisor, and he was sort of working by himself and published a few papers and managed to get a PhD based on these papers he'd published. Not having an advisor may be why he came up with such original ideas, I'm not sure. Anyway, so the first method is going to be called Pollard's row method. I'll explain why it's called a row method at the end of this. So we've got this number n, and the problem is find the factor p of n, where p is some prime number. And here's what we could do. Suppose we choose some random numbers, u1, u2, up to uk. And suppose two of them are congruent mod p, suppose ui and uj. Then we can find a factor of n by taking the difference ui minus uj and taking the greatest common divisor of u with n. And with luck, this might be the prime p. I mean, it might not. It might turn out to be the whole of n, for example. But this is a sort of probabilistic algorithm that will sometimes work. So let's analyze this idea. First of all, how many numbers k do you need to take in order to have a reasonable chance of having two of them congruent mod n? Well, for this, we recall the birthday paradox, which says if you have about 30 people in a room, there's a pretty good chance that two of them have the same birthday, which is quite surprising at first sight. But if you stop and think about it, suppose you've got 30 people, that means there are going to be about 30 times 29 over 2, which is about 30 squared over 2 pairs. That's going to be about 450 pairs of people. On the other hand, the number of days in a year is 365. So roughly speaking, we're allowed 450 guesses, each of which has a one in 360 chance of being right. And you need to go to probability theory to calculate the exact probability. But it's pretty obvious there's going to be a reasonable chance that one of these 450 guesses will be correct. So a rough rule of thumb is if we pick, say, about square root of n objects from a set of n, there's a reasonable chance two of the same. I mean, maybe you'd want to pick two times the square root of n or ten times the square root of n or something. It doesn't really matter. The order of magnitude is roughly the square root of n. So if we've got picking numbers u1 up to uk so that two are congruent mod p, how many do we need? Well, we're going to need about the square root of p of them. I mean, maybe twice the square root of p whatever, but that's roughly how many we need. So here's the basic idea of the method. You pick k elements where k is about the square root of what you suspect the smallest prime factor to be. Check all pairs of them and work out the greatest common divisor. And then you think about this a bit. You realize this method is actually completely useless because you need to check about k squared pairs. And the trouble is k squared is going to be about p. So the number of steps this takes is about equal to the smallest prime factor of n and we haven't really saved anything at all. We could have just done this by, you know, checking all primes up to the square root of n and that would have been just as fast. So at first sight, this method seems kind of stupid. It just doesn't work. But Pollard came up with this really ingenious method for vastly speeding it up. What you do is you select the numbers u1 to uk in a rather cunning way, which makes it much easier to check whether two of them are congruent. What you do is you pick some number u0 and then you pick u1 equals f of u0 and u2 equals f of u1, u3 equals f of u2 and so on where f is some function could be some sort of polynomial. And what polynomial? Well, most reasonable polynomials will work as long as they're not too simple. A reasonable choice is just f of x equals x squared plus one. This seems to work reasonably well. So how does that help? Well, what you do is you notice that if ui is congruent to uj modulo p, then this implies ui plus one is congruent to uj plus one modulo p because this is just f of ui. And this is just f of uj. So we don't need to check all pairs ui and uj because when we check one pair, we sort of automatically testing a large number of other pairs. So what we could do is we could just say check u1 minus u2 and then we could check u2 minus u4, u3 minus u6, u4 minus u7, minus u8 and so on. So we're just checking the pairs ui minus u2i to see if that has the highest common factor with our number n. And if it works for some pair ui and uj, then it's quite likely to work for one of these special pairs, ui, u2i. Roughly speaking, every time we check one of these pairs, we're also checking a large number of other pairs. For instance, when we check u4 minus u8, we're also implicitly checking u3 minus u7 and u2 minus u6 and u1 minus u5 and so on. So we get a large number of checks for free and we can sort of estimate the running time of this algorithm. Roughly speaking, this means the number of checks we have to do is the square root of what you might expect. So it ends up showing that the running time is about the square root of the smallest prime factor. Well, that's not actually true. It's actually rather hard to estimate the running time. In the worst case, we could be really unlucky and this method might take an absolutely huge running time. So this is a sort of average expected running time and the average expected running time is about the square root of the smallest prime factor, which might be sort of less than the fourth root of n. We might actually be able to do better in the fourth root of n because quite often the smallest prime factor of n will be less than the square root of n because prime factors have a tendency to be kind of unbalanced. So on average, we expect this to be quite a lot faster than checking all primes up to the square root of n. So let's give an example of this method. So we're going to take f of x equals x squared plus 1 and we're going to factor 7, 9, 1, 3. And we're going to start with the number 2 and then we're going to take 2 squared plus 1 equals 5. 5 squared plus 1 equals 26 and we keep going on like this and the numbers we get are 6, 7, 7, 7, 7, 2, 8, 9, 16, 40. Notice we're reducing modulo 7, 9, 1, 3 at each step in order to stop these numbers getting bigger and bigger. 7, 0, 9, 4, 6, 0, 7, 0 which is u, 7. Then we get 9, 7, 1973, 7, 4, 4, 7, 3, 5, 0, 6, 3, 1, 4, 8, 2, 8, 2, 9, 3, 1, 9, 9, 2, 0, 9, 3 which is u, 14. And all the time we're checking ui, we're checking ui, u2i, we're taking the difference of ui minus u2i with 7, 9, 1, 3 and trying to see what the greatest common divisor is and most of the time this won't be very interesting it'll either be 1 or 7, 9, 1, 3 occasionally but when we get to u14 minus u7, 7, 9, 1, 3 we find this is actually equal to 41 so we have found a prime factor. By the way notice that u13 minus u6, 7, 9, 1, 3 is also equal to 41 and we never actually checked u13 minus u6 directly in fact we didn't need to because we eventually caught it so to speak when we got to u14 minus u7. So that's Pollard's row method, it gives a substantial increase in the obvious method of just checking all prime factors up to the square root of n. By the way I should say why is it called the row method? Well it's because of the following picture. Suppose you look at the numbers f of n so it's not f of n, suppose you look at f of a where a is your initial seed f of f of a f f f of a and you look at all these numbers modulo sum prime p, what's going to happen is we're going to get f of a f of f of a and so on and we'll get more and more of these and eventually we will get a coincidence that two of them are equal mod p and once two of them are equal so here we have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 so here u4 is equal to u11 but then u5 will also be equal to u12 and u6 will be equal to u13 so we start at one point mod p and we just carry on and eventually we go round and round and round in a cycle and this cycle kind of looks a little bit like the greek letter row so that's why it's called the row method it's because you kind of start at one end of the row and end up going round and round and round in a cycle so now I have a quite different method also due to Pollard for factorising numbers this is called Pollard's p-1 method and the terminology is a bit confusing because that is a letter p and it shouldn't be confused with the row that is Pollard's row method I don't know whether the terminology was deliberately made confusing or not so this is actually not a general purpose method it's good for finding factors p of n with p-1 smooth well what does smooth mean? well it means it's a product of small primes and what does small mean? well it doesn't really have a precise definition but some numbers obviously are smooth for instance 65536 sorry 65537 is fairly smooth because if p is equal to 65537 then p-1 is in some set smooth because it's equal to 2 to 16 on the other hand you sometimes get primes p such that 2p plus 1 is prime and then 2p plus 1 is certainly not smooth because that minus 1 is just a product of 1 prime and 1 very large prime so Pollard's method is good for finding factors such that p-1 is smooth well sometimes actually it doesn't necessarily work terribly well on something like this anyway I'll show you how it works so what we do is we pick some number m which is a product of many small primes so we might pick m equals say 2 to the 4 times 3 squared times 5 times 7 times 11 times 13 times 17 times 19 so here we've taken all prime powers that are at most 20 this would be a fairly typical choice for m except of course instead of going up to 20 if you had a big computer you might go up to a few billion or something and now what the idea is we pick some number a at random don't really care what it is as long as it's not 1 or 0 or something and we look at a to the m minus 1 and take the highest common, the greatest common divisor with n so we're trying to factor the number n and if p is a factor of n with p minus 1 divides m then p is going to divide a to the m minus 1 because a to the m is common to 1 what p because this thing is divisible by p minus 1 so here we use Fermat's theorem so if p minus 1 is a product of some of these prime powers then we found a factor of n rather easily and we notice that a to the m can be easily worked out using fast exponentiation remember we can work out exponents modulo sum number very quickly so let's give an example of this first of all notice we don't actually need to write down the number m explicitly as you will see in a moment because when we're working out something to the power of m we can apply the factorization method for working out powers which is probably easiest just to do an example let's take m to be 2 squared times 3 times 5 so we're taking a product of all prime powers less than 5 just to make this easy enough to do by hand and we're going to take our number n to be 119 so this is the number we want to factor and we're going to take our random number a to be equal to 2 and now we work out a to the m modulo n so how do we do this? well what we do is we work out a to the 1 is equal to 2 then we work an a squared is equal to 4 and a to the 4 is equal to 16 so here we're just using the fact that a to the m is equal to a squared squared cubed the fifth power so that's going to make it much easier to work out each time we work out one of these powers we have to reduce modulo n of course so now we work out a to the 4 cubed which is 16 cubed which turns out to be congruent to 50 again mod 119 and now we work out a to the 4 to the 3 to the power of 5 which as 50 to the power of 5 which just turns out to be congruent to 50 modulo 119 again so here we've got a to the m so a to the m minus 1 the greatest common device with 119 is 50 minus 1 119 which turns out to be 7 so we found a factor of 119 and the key point is 7 minus 1 is a product of sum of the numbers 2, 2, 3 and 5 that we're dividing m so this method is you know pretty fast and efficient when it works but there's a serious problem with it we were assuming that our prime factor p of n had the property that p minus 1 is a product of many small primes and there's a problem with this in that most of the time if you take p minus 1 it's not going to be a product of lots of small primes it'll probably have a couple of small primes and one very very large prime I mean the typical factorization of a random number its prime factors tend to be increasing quite rapidly so there is in fact a variation of it called Lenstra's elliptic curve method so what this, what Pollard's row method is using is you're taking the integers mod m that are co-prime to m so you take the integers mod n and you take the non-zero elements and the key point is that these things form a group these are the ones co-prime to n do so Lenstra's elliptic curve method is rather similar to Pollard's p minus 1 method except instead of taking the integers mod n under multiplication you take something called an elliptic curve which also has a group structure anyway I think elliptic curves are a little bit too complicated for an introductory number theory course so I won't be discussing those okay next lecture I'll be explaining some of the applications of number theory to cryptography