 This lecture is part of Berkeley Math 115, an introductory undergraduate course on number theory, and it will be about numerical calculation. So, let's have some typical examples of problems you might want to calculate the answer to. So, first of all, we might want to solve ax plus by equals c, where a, b, and c are given. And this is, of course, just something to do with Euclid's algorithm. Another problem we might have is n prime. And if it's not prime, we might want to factorize it. Thirdly, we want to solve f of x is congruent to 0 modulo p for some prime p where f of x is a polynomial. For instance, we might want to solve x squared plus 1 is congruent to 0 modulo p. Or, fourthly, we might want to work out what is a to the b modulo n. Now, all of these problems, if you look at them, they're, in some sense, trivial to solve. We can solve all of them just by doing a lot of case by case checks. So, if n is prime, we can factorize it just by checking all factors up to the square root of n and so on. With all of these, there's no problem in principle of solving them. The problem is we want to be able to solve them fast. So, the idea is that these numbers a, b, and c, and the prime p, might be very large numbers, maybe tens or hundreds of digits. And we want to have an algorithm that's reasonably fast for solving them on a, assuming we've got a computer, of course. I mean, if these numbers are hundreds of digits, you don't want to do them by hand. So, we want to actually find reasonably fast algorithms. And we also want to estimate how fast the algorithms are. So, let's start with a very simple example. Problem, let's find a plus b. Well, that's kind of stupid. I mean, you all know how to add up two numbers a and b. So, we're going to use the usual algorithm for that. And what we want to do is just estimate how long it takes. Well, the usual algorithm for adding up a and b, well, the number of operations is going to be about the number of digits of a and b, or maybe twice the number of digits or something. So, we're not interested in the exact amount to multiply the number of digits by. So, what we do is we use something called the O notation. You say it's O of number of digits. So, what O means is it's some constant times the number of digits. Or, more precisely, we might say it's less than or equal to some constant times the number of digits. So, it might be twice the number of digits or three times the number of digits. But we don't really care very much because that just depends on minor details of how you implement it. And we can ask, is this best possible? Well, obviously, yes, because it takes O of the number of digits even to read in the number a. So, you can't even look at the number a without using at least this number of operations. So, we have a precise, reasonably precise estimate for how long addition takes O of the number of digits. And if we write the number of digits n, we would just say this takes O of n steps. Notice that n is not the size of a and b. It's not the size of the numbers a and b. It's their length. So, it is the length of them in decimal notation. So, you've got to be a bit careful about this. More generally, if we've got a problem, if you've got an algorithm, we might say it takes, we'll see some examples where it might take O of n squared or O of x of n. So, this means it takes at most some constant times n squared steps. And again, you've got to be careful. n squared, the number n is not one of the numbers a and b in the input. It's the length of the numbers written in decimal form. In other words, it's the length of the input to your algorithm. So, this is the sort of standard way of estimating running time of algorithms. Use this big O notation. And we want to know what is a fast algorithm and what is a slow algorithm. Well, a very rough guideline is that polynomial time algorithms tend to be reasonably fast and anything that takes more than polynomial time is tends to be a bit slow. So, polynomial time means something like O of n to the k for some fixed k where n is the length of the input. People use polynomial time a lot because, I mean, the exact running time of your algorithm depends on what computer you've gotten, exactly how you implement it and so on. And, you know, details of your computer might change from O of n to the 5 to O of n to the 6 or something. But if something is polynomial time, then it's probably polynomial time no matter what computer you're using. So, polynomial time is the kind of stable property of your algorithm. So, now let's do another example. Let's try and multiply a times b and try and figure out how long it takes. Well, the usual algorithm, you know, for multiplying 13 by 17 or whatever, you take 3 by 7, which is 21, you take 1 by 7, which is 7, you take 1 by 3, which is 3, you take 1 by 1, which is 1, and then you add them all up. And as you can see, the number of steps is the number of digits of a times the number of digits of b times some constant. So, the number of digits of a and b is going to be bounded by the length. So, this is an O n squared algorithm. If n is the length of a and b, then it will take some constant times n squared steps, and I don't really care what this constant is. I mean, it might be 2 or a half or something. And it seems kind of obvious that this is best possible. I mean, if you've got a 10-digit number times a 10-digit number, it's very hard to imagine any way of multiplying them that's any better than taking every digit of a times every digit of b. Estonishingly enough, there are much faster algorithms. I mean, I just didn't believe this when I first heard it because the usual algorithm is so simple and obvious. It's very hard to see how you could possibly be improved. Well, there's not just one faster algorithm. There are several faster algorithms. So, I'll sketch two of them. The first of them is something called the fast Fourier transform, which sort of revolutionized signal processing. And what the fast Fourier transform is, is very roughly, you take these numbers a and b, and you apply something called a Fourier transform to them. Well, you don't really, but so this is something called a Fourier transform. And the fast Fourier transform, as its name suggests, is very fast and takes o of n log of n steps, roughly speaking. Then you do point something called pointwise multiplication, which takes o of n steps. And then you, so we get a b hat. And then we do something called an inverse fast Fourier transform. Actually calling it an inverse fast Fourier transform is a bit misleading because the inverse fast Fourier transform turns out to be really the same as the fast Fourier transform. And if we do this, we get a times b. And this takes o of n log n steps. Well, the problem with this is I need to tell you what the fast Fourier transform is. And that's a little bit complicated for an introductory number theory lecture. So, I'm not going to. And instead, I'm going to give, describe a somewhat, a sort of slightly different method of doing very fast multiplication. Now, this method I'm going to describe is easier to understand than the fast Fourier transform, but in practice, it's usually a little bit slower. So if you really want to multiply numbers fast, you probably need to go and learn the fast Fourier transform. So the idea of this is to use the Chinese remainder theorem. So you remember the Chinese remainder theorem says that we can write the integers, the integers modulo ab is the same as looking at the integers mod a times the integers mod b. In other words, if you know an integer modulo a, and you know an integer modulo b, then you know it modulo ab. Now suppose you want to multiply two numbers m times n. What we do is we choose a lot of small primes, two, three, five, seven, etc., whose product two times three times five times seven is bigger than a times b. And let's say the product, let's call this product p. And now what we're going to do is we're going to work out mn modulo big p. And this will be enough to tell us what mn is because p is bigger than the product. So that product should have been mn. So how can we do this? Well, we can find mn mod p as follows. What we do is we find mn modulo two, three, five, seven, etc., and this is really fast because these are going to be fairly small primes and all we have to do is to reduce mn mod the prime and multiply them. So how many steps is this going to take? Well, it's going to be roughly the number of primes. And how many primes do we have? Well, the number of primes will be about the logarithm of m, or the logarithm of n, which will be about O of n. It might be a bit bigger than that. It might be a little bit smaller than that. We're just going to give a very rough estimate. So we're going to, the number of primes you need is about the length of the input. And multiplying mn modulo, each of these primes is again going to take very roughly that number of steps. And then what we can do is we can reconstruct mn from mn modulo two, three, five, and so on by the Chinese remainder theorem. And you remember this uses Euclid's algorithm and Euclid's algorithm is pretty fast. And if you put all these three steps together, you find the algorithm takes about n times log of n, or n times log of n squared steps or something, you depend on exactly how you implement it. Incidentally, this is a bit like the fast Fourier transform. Both of them, you do some preliminary transform, you either reduce m and n modulo lots of primes, or you take the Fourier transform. And then you reduce to doing lots of lots of individual multiplications. So you do multiplication, mod two, mod three, mod five, and so on. And here, you also have to do a lot of separate multiplications to work out the point-wise product of the Fourier transforms. So although the fast Fourier transform and the Chinese remainder theorem method look rather different, they're somehow both using the same key idea in the middle of them. So in practice, there are really three different cases you have to think about when multiplying two numbers. First of all, the case of numbers that will fit into one computer word, which these days is about 64 bits. And in that case, you just use the hardware multiplication routine built into your computer. Another case is when the numbers fit into a few computer words, and then you just use the usual high school algorithm for multiplying, except that you work that the base you use is two to the 64 or something. And the third case is very big numbers when you use some sort of fast Fourier transform algorithm, or maybe this algorithm, if you can't remember how the fast Fourier transform works. So let me just give an example where using this Chinese remainder theorem method is actually possibly the best one. So suppose you want to have the following problem. I want to compute some determinant and it's maybe going to be a 10 by 10 determinant where all these numbers of say 100 digits and this is maybe 10 by 10. And you could do this by just sort of multiplying everything up using maybe even a fast Fourier transform method. But that's going to be a bit complicated. The easy method is we compute determinant modulo p for lots of small primes p two, three, five, seven. And again, we want the product should be the product, all these primes should be bigger than the absolute value of your determinant. So you have to estimate what's the absolute maximum possible value this determinant could possibly be. And you might say, well, you're going to take 100 digit numbers and you're going to multiply 10 of them together. And you're going to have a sum over 10 factorial of those or something. So you can you can estimate it's going to contain most, you know, 2000 digits or whatever. So you find 1000 primes and you work out this determinant modulo each prime. And I should say in order to compute determinant, please use Gaussian elimination. So you remember from a linear algebra course, there's a fast method of computing determinants using Gaussian elimination. And there's a really slow method where you sum over all ways of taking a product of one entry from each row and taking one entry from each row and multiplying and adding is a is a really stupid way to compute determinants because it's incredibly slow. So you have this fast way of computing each determinant modulo p. And then we can reconstruct using the Chinese remainder theorem. In fact, sometimes all you want to know is whether this determinant is zero or not. And in that case, you don't even need to use the Chinese remainder thing, you can just check whether the determinant is modulo zero modulo p for all these primes p. So in cases when the the reason why this works so well is is that computing determinants only involves addition and multiplication and subtraction and reducing numbers modulo lots of primes is really good for that. It's much worse if you want to sort of do division with a remainder or test whether one number is bigger than another. To do that, you've got to convert these back to proper integers using the Chinese remainder thing, which is a bit slow. So the reduction mod p method is really fast if you're just using addition, multiplication and subtraction. Another reason why this is good is it allows parallel processing. So what this means is that instead of having one computer that works it out mod p for 1000 different primes, you have 1000 computers all working together. And each of these computers is just doing one prime and the calculations modulo all these primes are also independent of each other. So the computers can all do this calculation simultaneously. So some of modern very fast computers have many thousands or millions of processes all working together. So if your calculation allows parallel processing, then then you can sometimes speed it up a lot. You can't always you can think of the analogy suppose you're digging a hole. Well if your hole is a big trench, then that's easy to do by parallel processing. You can have 500 people all digging a little bit of a trench at the same time and it's very fast. However, if your hole is is something like an oil well that goes very deep into the ground, then then having 500 people all trying to dig in at the same time is not really possible. There's only room for one guy at the bottom of this hole doing the digging. So that's something where you can't do parallel processing. And problems in number theory are the same. Some of them are very good for parallel processing and some of them just aren't. So I should also have a bit of a warning about log n factors. So you'll sometimes see people say the running time of an algorithm is o n or o n log n or o n log n cubed or o n log log n or various other things. And what I want to warn you about is a lot of these estimates are kind of meaningless. In particular, factors of log log n are generally meaningless in practice. The point is that log log n is almost constant. In fact, I've been doing calculations for a long time and I can tell you that log log n is nearly always approximately three, unless your number is very big indeed. So for example, if you've got two algorithms, one of which takes log log n times n steps and the other which takes 100 n steps Now you might think that this algorithm is going to be better because if n is large, 100 n is going to be much less than log log of n. But 100 n will only be less than log log of n when log log of n is bigger than about 100. So what you would, so this would only happen when n is approximately e to the e to the 100. And this number is so ridiculously large that it, you know, you can't write it down in the in the visible universe. So in practice, the apparently slower algorithm will be better. And that's because as I said, log log of n is in practice, it just doesn't get big. Even these exponents here, in fact, as a log event should be taken with a big pinch of salt, because the actual running time of an algorithm will not be the running time given by by your theoretical estimate. The point is, there are always extra time delays that theoretical analysis does not take into account. For example, if you've got a if you're running a big computer, it might fill a room or something. And now, during your calculation, you've got to retrieve numbers from storage and and storage is going to be on the other side of the room. Well, that's not a big deal because your your signals are traveling at the speed of light. But light's going to take several nanoseconds to cross that room. And on computers, operations are often just taking one nanosecond. So if you've got to retrieve something from across the room at the speed of light, that's going to slow down your calculation by quite a large factor of 10 or 100 or something. And these delays are not taken into account by these theoretical estimates. And that will give you extra factors of log of n or maybe even an extra factor of the cube root of n or something. I'm too lazy to work out exactly what it is. So all these estimates of running time are rough guidelines. So an algorithm that takes O of n steps is probably better than an algorithm that takes O of n log of n cube steps. But in practice, it might not be if there's a really big constant in front of the in front of this algorithm and so on. So we should also ask how many operations are possible in a calculation. This honestly varies with time. So I can give you a historical example. In the 1960s, the English mathematician Peter Swinnett and Dyer wrote a paper on computational number theory. And he gave an estimate for this at the time. He said that a million operations was trivial. A billion operations was possible but needs thought. In other words, you should rather carefully write your computer program to be as efficient as possible. And a trillion operations is about the limit of what it's possible to do. And he said that it could only be justified by major science if we advance. Well, we're now half a century beyond that and 10 to the 12th operations is now trivial. I mean, your screensaver probably burns through more than a trillion operations. 10 to the 18 operations. Well, here it's possible. So large supercomputers, I don't know what the current record is, but a few years ago they were doing 10 to the 15 operations a second. At least that's what they claim to be doing. So they could get through 10 to the 18 operations in about an hour or so. So that's certainly possible. But again, if you're using a really big supercomputer, you probably want to spend a certain amount of time optimizing your program because running these things costs serious amounts of money. What is the limit of what we can do? Well, I really don't know. And who knows what the NSA have sitting in their basement or whatever. So somewhere between 10 to the 24 and 10 to the 30 operations is probably around about the limit of what we can do. Of course, this might speed up quite a lot some time in the near future because we have people trying to build quantum computers. And at the moment, quantum computers can't do all that much. But if the size of quantum computers got speeded up by a few, sorry, if the size of quantum computers increases by a factor of a few thousand, then they will be able to do things that are beyond the ability of any imaginable classical computer. Another thing you can do to speed up computers quite a lot is build specialized hardware. So building specialized hardware can give you another factor of a few thousand. For instance, the deep blue chess playing computer, which was the first computer to beat a world chess champion about 20 or 30 years ago, that use specially designed chips that for playing chess. Another example that's very common these days is we have specially designed chips for mining Bitcoin, which are thousands of times faster than an ordinary laptop, which is why you're probably not going to get very far mining crypto currency just by using your your laptop or cell phone. So now I'm going to discuss some other algorithms. First of all, I want to recall the so-called Russian peasant algorithm. This is, it's supposed to be called the Russian peasant algorithm because there are claims that when travelers went to went to Russia in the 19th century, they found Russian peasants multiplying numbers using this algorithm. I don't believe this for a moment because I'm sure Russian peasants are better things to do with their time than do long multiplication of integers. But anyway, it's got this catchy name of Russian peasant multiplication. And let me explain how to do this. So let me try and multiply 13 by 5. And apparently Russian peasants knew how to multiply and divide by 2. And they knew how to do addition, but didn't know anything else, which is a game one of the reasons I don't believe this was done by Russian peasants. So what you do is you keep dividing this by 2. So we have 13 divided by 2 is 6. And 6 divided by 2 is 3. And 3 divided by 2 is 1. And 1 divided by 2 is 0. And you notice sometimes we had to throw away a remainder. So we'd a remainder here and a remainder here and remainder here. And now what I do is I keep multiplying 5 by 2. So 5, 10, 20, 40. And now what I do is I take all the numbers where we had a remainder. So we had a remainder at these three lines. And so I now adopt these three steps. So I now add these up and I get 65, which is indeed 5 times 13. The reason why this works is what you're really doing is converting the numbers into binary and using the usual algorithm for binary. So what this just says is that 13 is equal to 1, 1, 0, 1 in binary. So this 1 comes from here. And this 1 comes from here. And this 1 comes from here. And this repeated division by 2 is just converting a number to binary. And then this multiplication by 2 is just what you do for binary multiplication. And this is a really terrible method for multiplying numbers because it's like the usual multiplication algorithm in base 10, except it's much slower because you're first converting to binary and then multiplying them in binary, which takes ages. So first site is a stupid algorithm. But the idea behind it turns out to be really, really useful. So what we're going to do now is have the problem of how do you work at a to the b modulo m? So you remember I actually discussed this earlier. So I'm just going to kind of review it again. So you remember there was a stupid algorithm where we just take a times a times a and sum and multiply them together. And this takes an insanely long time. It's sort of O of x of n because the number of times you have to multiply a by itself is about equal to the number b. And you remember the size of b is exponential in the length of b. So this is an exponential time algorithm and is completely awful. And you remember I also said we should also reduce mod m each step. So as well as algorithms sometimes take a lot of time, they also sometimes take a lot of space. And the amount of space an algorithm takes is obviously at most the amount of time it takes because each time you use some space, you have to use some time to take that space. But if you want to be really careful, you occasionally get algorithms that use a lot of time, but a small amount of space. So we can reduce the amount of space this algorithm uses by reducing modulo m every time we multiply and that makes sure these these numbers don't take too long. So the third method improvement is to use Russian peasant exponentiation. And this is actually the algorithm I mentioned earlier. But let me show you how you do this. Suppose I want to work out two to the 13. What I do is I take the number 13 and I keep dividing it by two just as the Russian peasants did for multiplication. And I keep track of when I have a remainder leftover. Now I take this number two. And instead of multiplying it by two, I keep squaring it. So here I get four, which is equal to two squared. Here I get 16, which is equal to two to the four. And here I get 256, which is equal to two to the eight. So now what I do is I take these numbers and multiply together the corresponding numbers here. So I get two times 16 times 256, which is equal to 8192, which is two to the 13. So you see this is just like Russian peasant multiplication, except instead of doubling numbers here, we're squaring them, which means instead of multiplying two and 13, we end up taking two to the power of 13. And we should do a quick estimate of how long this actually takes. Well, the number of divisions here is going to be about o of n. And so the number of times you have to square is going to be about o of n, where n is the length of this. And that's going to each squaring is going to take, you know, maybe o of log n steps. So we're going to get sort of terms like o of n log n and so on. I'm not going to work at them exactly. And then we have to multiply these together and divide, reduce modulo m each step. And that's going to take o of n log n to the power of something. And as I said, I don't really care what this exponent is. So the conclusion we get is that exponentiation is actually a really fast algorithm. It's not much more than linear in the length of the input, especially if you use some sort of fast multiplication involving fast Fourier transforms, which is going to give you extra factors in the exponent of log n, which is why I'm being a bit vague about it. So the conclusion is that Russian peasant exponentiation is really fast. And this working out h to the b modulo m is one of the most commonly used operations in computational number theory because it's so fast. So we can ask, is this actually best possible? Can we get fewer multiplications for exponentiation than used by the Russian peasant method? And the answer is it's not best possible. We can actually do better than this. And this seems surprising because, you know, the Russian peasant exponentiation method seems so efficient. I mean, you certainly can't do, you can't get more than a factor of two better because just repeatedly squaring something is going to take it takes up at least half the multiplications. However, we can do better. Let's work at a to the power of 15. So the stupid method a times a times a is going to take 14 multiplications. What about the Russian peasant method? So what we do is we work at a, we work at a squared, we work at a to the four, we work at a to the eight, and then we have to multiply together a squared, a four, a to the eight. So we get, we then get a to the nine, a to the 11, a to the 15. So what we see is that each exponent here is the sum of two exponents that come before it. And if we work out here, we've got one, two, three, four, five, six multiplications. So it's definitely better than the stupid method. But there's an even better method. What we do is we work at a, a squared, a cubed, a to the four, a to the five. And now we're using the stupid method. So we have one, two, three, four multiplications so far. But then what we do is we're trying to get a to the five. Sorry, we don't need a cubed. So we get a squared, a to the four, a to the five. And this takes one, two, three multiplications. And now what we do is we take a to the five, a to the 10, a to the 15. So we're now working in a cubed using the stupid method. And this takes another two multiplications. So altogether, we get five multiplications. So we've actually done better than the Russian peasant exponentiation method. Notice we can do something similar. Whenever you're working at a to the m times n, we can first of all work at a to the m and then raise it to the power of n. And this will sometimes be faster than the, than the sort of Russian peasant exponentiation method. Actually, the, the, I'll just say a little bit, people actually studied this problem. What is the smallest number of multiplications needed for working out a to the n. And this is sometimes denoted by L of n. And people looked at this function and you can get it within a factor of two. So L of n is going to be at most two times the logarithm to base two of n. And it's at least logged the base two of n. So it's known to within a factor of two, but trying to get it exactly is fairly tricky. And it has some very subtle behavior. For example, it seems obvious that L of two n is equal to L of n plus one, because it's obvious that the most efficient way of working out a to the power of two n is going to be to work out a to the power of n and then square it. And you, you can't possibly do any better than that because squaring is such an efficient operation. But this is actually false. The smallest counter example is L of one nine one is equal to L of two times one nine one, which is equal to 11. And the pointers, you can work out L of 382, which is two times one nine one by using some odd numbers in a slightly more efficient way. The exact sequence you can get is one, two, four, five, nine, 14, 23, 46, 92, 184, 198, 382. And you work at each number in the sequence is the sum of two numbers in the previous sequence. And there are only 11 steps in this, checking that L of one nine one is 11 is rather more tedious because you've got to check a lot of cases, even more bizarrely, it's possible for L of n to be greater than L to n, which is even more astonishing. An example of this was found by Neil Clift only a few years ago. And his value of n was three, seven, five, four, nine, four, seven, zero, three. So he used a computer to find this. And I really hope he used a computer. If he did this by hand, it's really rather scary. If you'd like an open problem about this number L of n, there's apparently an open problem to show that L of two to the n minus one is less than or equal to n minus one plus L of n. It's been checked by computer for about the first 40 or so values, but beyond that it's an open conjecture. So another basic operation is to calculate a polynomial. So we might have three x to the four plus two x cubed plus five x squared plus x plus seven. And I just want to comment on this because the obvious method of calculating polynomials is not very efficient. So the obvious method is we calculate one x, x squared, x cubed, x to the four, which takes one, two, three multiplications. And then we multiply all these by the numbers we've got above there. So we multiply this by three, and this by two, and this by five, and this by one, and this by seven. And this takes another five multiplications. However, this is not a very good way of multiplying polynomials. In the days when people had to do calculations by hand, they were taught something called Horner's method for evaluating polynomials, where you evaluate the polynomial like this, you take x times three, and then you add two, and then you multiply this by x, and then you add five, and then you multiply this by x, and then you add one, and then you multiply this by x, and then you add seven. And you notice if you arrange the polynomial like this, you've actually reduced the number of multiplications by a factor of about two. So even when you implement polynomial multiplication on a computer, you should remember to do it in this efficient way. So next I want to discuss the problem of finding a square root of minus one modulo p, but I think I'll put that in the next video.