 This lecture is part of Berkeley Math 113, an introductory undergraduate course on number theory, and will be a little bit about applications of number theory to cryptography. So cryptography is just the art of writing or decoding secret messages. So I'll just give a bit of background. The main problem is we have two people, traditionally called Alice and Bob, who are communicating with each other, and there is someone who's intercepting their messages, traditionally called Eve, which is short for Eve's dropper, I guess. And Alice and Bob wish to communicate without Eve trying to figure out what they're saying. So one example, Alice might be a submarine, Bob might be the admiral controlling the submarine, and Eve might be some sort of enemy trying to sink the submarine. Another example, Alice might be, say, Amazon, Bob might be you trying to buy something on Amazon, and the person in the middle might be someone trying to steal your credit card number, so there's to be some sort of thief. So in particular, all of internet commerce depends on being able to communicate with someone without someone decoding what you're saying. So there's a traditional way of doing this, which is for Alice and Bob just to agree on something. For instance, there could be a code book. This was often used by submarines in the Second World War. The submarine would just have a code book where the message AAAA means attack at dawn or something like that, and you would just send the message AAAA and the submarine would be able to figure out what that means. Another method is to use something called a one-time pad. This is a one-time pad is just a collection of supposedly random numbers where Alice and Bob have this one-time pad and add these random numbers to any message they have so that Alice and Bob can decode it, but Eve can't. The third one is something like the German enigma machine in the Second World War. This was sort of very ingenious machine that turned messages into seemingly random garbage and then decoded them at the end. So Alice and Bob would both have this enigma machine. Well, there's a big problem with all of these. These require Alice and Bob to share something. So the submarine and the admiral need to be given the same code book or they need to have the same one-time pad or the same enigma machine. And this is sort of sometimes okay. I mean, it works if you've got a ship which starts off with a code book and can then use it. It has several problems. First of all, if the code book or enigma machine is compromised, then it can be very difficult to change it. In particular, I think the British in the Second World War, for example, their ships used code books and they eventually figured out the Germans were reading the code books and there wasn't very much they could do about it because their ships were several weeks away from their home base and they had to travel back to their home base and get a new book and travel out again. So even though the British knew their code books had been compromised, it wasn't anything they could do about it for some time. Similarly, if you're trying to buy something on Amazon, you can't sort of get Amazon to send you a code book every time you want to buy something and this would be sort of quite hopeless. So it's very useful to be able to communicate between Alice and Bob without them actually exchanging a physical object beforehand or trying to exchange information that Eve doesn't know. So you assume that Eve knows absolutely everything that passes between Alice and Bob. So if they exchange a code book, then Eve intercepts the code book and gets a copy of it. And if they're using enigma machines, then Eve steals a copy of the enigma machine, which is incidentally what happened in the Second World War at one time. The Germans sent an early copy of an enigma machine somewhere via the Polish post office and the Polish post office quietly took a copy of it. So we have this problem. How can you communicate securely even if someone can intercept all your communications? And the provisional method of doing this was worked out by Diffie and Hellman using something called a trapdoor function. So let me try and explain what a trapdoor function is. A trapdoor function is a function F such that it is easy to compute, but the inverse is very hard unless you know some sort of secret code. I'll explain how you can do this in a moment. So what this means is it's a function. It will usually be a function between two finite sets. And these finite sets are really large. So it might be a finite set from all integers less than 10 to 1,000 to all integers less than 10 to 1,000, say. And it's easy to compute and the method of computing it is made public. On the other hand, you can't compute its inverse very easily and you may say, well, if you know how to compute it, why can't you just compute its inverse by trying all numbers in the domain until you find something with that image? And that's why this set needs to be very, very big because if this set contained only a million elements then you can compute the inverse just by trying each of these million elements until you found the right one. So it must have this funny property that the inverse can be computed easily if you know some sort of secret about this function F. And we'll explain how you can arrange that a little bit later. But first I just show you how you can use this to communicate. So what happens is A and B are trying to communicate. So A chooses a trapdoor function F of A and makes this makes F of A public. In other words, everybody, including Eve and B, no F of A. Right, B sends messages to A by encoding them using F of A. So if B's message is M, then B sends the message F of M and Eve intercepts the number F of M but can't decode it because the trapdoor function is very hard to decode. But A can decode it because A knows the secret method of decoding it. Well, that shows how B can send messages to A. Well, how does A send messages to B securely? Well, first of all, B also chooses a trapdoor function F of B and makes it public and A uses B's trapdoor function to send messages to B. So if we can find trapdoor functions, then A and B can communicate securely because each of them chooses a trapdoor function and publicizes it. Well, the problem is how do you find trapdoor functions? First of all, I just described the difference between a trapdoor function and a secure hash. So we have a secure hash function or a trapdoor function. And these are rather different. So these are both functions from a set with, say, 10 to 1,000 elements to 10 to 1,000 elements. And they're both reasonably easy to compute. For a secure hash, the inverse F is very hard to compute. For the trapdoor function, F the minus 1 is hard to compute unless you know some secret. So this is a sort of trapdoor. So secure hash functions are also very useful. They're actually used in blockchains. So I'll briefly explain what a blockchain is. So a blockchain consists of a whole lot of blocks. So each of these blocks will be some sort of collection of data. It might typically be a million bytes of data. And it's sort of chained in that each block in the blockchain contains a secure hash of the previous blocks. In other words, you take some sort of secure hash function of everything in the previous blocks and add it to here. And this means it's impossible to change anything in one of the previous blocks if someone knows the latest block because they've got a secure hash, which contains all the contents of the previous blocks. And knowing the secure hash of the previous blocks allows you to verify that nobody has changed it because no one could change them in a way that kept a secure hash unless they could compute the inverse of the function, which by assumption is very hard to do. So cryptocurrencies like Bitcoin often make use of a blockchain. These blockchains can be pretty large. For instance, the current Bitcoin blockchain, there are several hundred thousand of these blocks. Each of these blocks is typically somewhere around a megabyte. And the total size of all this blockchain is hundreds of gigabytes. But you don't always need to pass around these hundreds of gigabytes because if you know the last block in the blockchain, you can verify that all the previous blocks in the blockchain haven't actually been changed if you get suspicious about what's going on. The secure hash function is also used to mine Bitcoins. So mining Bitcoin means finding a value A such that f of A is small or nice in some sense. You might want to try and find A, so the last 20 bits of the value f of A are zero. Maybe small is the wrong word. It has to be nice in some sense. And you can do this by trial and error by testing, say, 10 to the 20 possibilities. And this just takes a very great deal of work. You've got to do 10 to the 20 operations. And this is basically why Bitcoin mining is difficult to do and uses enormous amounts of power and so on. So Bitcoin miners are basically racing to find numbers A, such that f of A has some sort of nice property. There's a bit of a problem about whether secure hash functions have trapdoors. So some people have published what are supposedly secure hash functions for people to use. And whenever people do that, there's always some speculation about whether they've secretly hidden a trapdoor that makes these secure hash functions difficult to invert. I haven't actually heard of any cases where this has happened, but if you're paranoid, this is the sort of thing that keeps you awake at night. Anyway, we now have to solve the following problem. Find a trapdoor function. So this is a function that's easy to calculate, but it's very hard to calculate as inverse even if you know how to calculate the function. The first solution was found by Clifford Cox, who didn't publish it for various reasons, and it was later rediscovered by Rivest Shamir and Adlaman, and is now known as the RSA method. And it works as follows. What you do is we choose large primes P and Q, and what does large mean? Well, this is a bit time dependent. As we'll see in a moment, there must be sufficiently large that it's very difficult to factorize P times Q. So making them 100 digit is probably not secure enough these days. Making them 1000 digit primes is probably good enough. And we choose a large integer K. And now what we do is we publish M equals P times Q and K, but not the individual numbers P and Q. So P and Q, knowing the factorization of M is the secret trapdoor. And now how do you send messages? Well, we define F of X to be X to the K, modulo M. So this is your public function. So tell everyone what K and M are, and they can work out what X to the K mod M is very quickly because they all know about the sort of Gaussian peasant method of calculating exponentials mod M very fast. So typically X, K and M will all have hundreds or thousands of digits in. So we're doing exponentiation on these very big numbers. So how do we compute F to the minus one, the inverse of F? Well, we know that X to the phi of M is congruent to one mod M, at least if X, M are co-prime. X and M will almost always be co-prime unless some weird coincidence happens. So this means X to the J, K is congruent to X mod M, if J, K is congruent to one modulo phi of M, where this is Euler's phi function. That's because we're using the fact that X to the phi of M is congruent to one mod M. This is just Euler's theorem. So if we know, if we know P and Q, we can compute this. So phi of M is just P minus one times Q minus one. So we can then use Euler's algorithm to solve this equation. So this implies means we can find J using Euler's. And this means that we can invert our message because F to the minus one is just given by X goes to X to the J because taking the Jth power of X followed by taking the Kth power of X gives you back X. So this is the inverse of X. So to compute the inverse of X, well, we can do this if we know J, which we can find out if we know the factorization of M. So if you know the secret factorization of M, then you can invert messages. If you don't know the secret factorization of M, then you're kind of stuck. You either have to find a fast way to factorize M, which nobody knows how to do, or at least if they do, they're not admitting it, or you have to find some other way of inverting this function. And so far, nobody has figured out how to do this. So this method relies on the fact that it is easy to find large primes and hard to factorize large numbers. So if we've got a number with 1,000 digits, we can sort of test whether it's prime with fairly high probability, but we don't really know how to factorize it in general. And we can find large primes as follows. How do you find large primes will pick? Here's one way, pick a random large number and just apply, say, a probabilistic primality test. And we just keep going until you find one that works. So how long do you have to go? Well, a 1,000-digit number, the chance of it being prime is one in 2,000 or 3,000 or something. So you can try a few thousand large numbers. And of course, you could speed things up a bit because you could speed things up a lot by not bothering to test numbers that are even or divisible by three or whatever. So finding large random primes is actually quite easy, but factorizing them is hard. So this is the RSA method of sending messages. By the way, I should say there's a bit of a problem that people have found. It's actually quite tricky making sure that your number is sufficiently random and several people have been caught out by using random number generators that turned out to be rather less random than they thought they were. First of all, do never use a built-in random number generator. The problem is that random number generators are not actually random. So many programming languages will have a random function that supposedly produces a random number. In fact, it only produces pseudo random number. These are actually produced by a very simple procedure and a random for many applications. If you want to do Monte Carlo integration or something, they're probably good enough. But of course, you can assume your enemy knows your random number generator. And if they know their random number generator, they can probably figure out what prime you're using. So this is useless. People have also tried using things like basing your random number on the time of day or something like that. And the trouble is once you figure out someone is basing their random number on the time of day, you can simply figure out the time of day yourself and work out what their supposedly random number is. And it's actually pretty difficult to get a computer to generate random numbers because computers more or less by definition are not random. One of the things computers do to produce random numbers is they will ask a user to type on a keyboard. And if they typed several thousand letters on a keyboard, you ignore the letters because humans are actually quite bad at typing random letters. What you do is you time the timing of these keystrokes to within a millisecond and use these, the last bit of the millisecond time as a sequence of random numbers or something like that. But this takes effort. I mean, the poor user has to sit there for five minutes typing frantically in order to produce random numbers. So it's actually quite tricky producing random numbers in a non-random way. So let's comment on a bit on how you might break this. Or more generally, I'm just gonna briefly summarize a few methods of breaking codes. One would be to factor the number M. This is hard with current technology, but if you had a big quantum computer, you would actually do this. There's an algorithm called Shor's Algorithm for quantum computers, which will actually factorize large numbers very fast, providing you've got a sufficiently big quantum computer. And it's not quite clear how big the quantum computer has to be to do this. If you could have a quantum computer with a few hundred thousand or a few million qubits, that might be enough to seriously dent internet commerce using the RSA algorithm. Of course, people doing internet commerce know all about this problem and are busily trying to invent various methods of sending information that are proof against attack by quantum computers. At the moment, quantum computers are somewhere around 50 or 100 bits, so they've still got to be scaled up quite a lot. And at the moment, it's unclear whether it's going to be possible to do this. It's also the interesting side effect that we don't actually know what the current best factoring algorithm is. We know the current best public factoring algorithm, but if someone ever came up with a really fast factoring algorithm, they might well just keep it secret because it would be so valuable. I mean, you could break everybody's codes and steal all credit card numbs on the internet and so on. So factoring algorithms have now become military secrets. You could try and decode without factoring M. And so far, nobody has been able to find a way of decoding the RSA algorithm without factoring M. And conversely, I don't think anyone has found a really convincing argument that isn't such a method. So that's still open. And there are all sorts of other methods that there's a well-known method called rubber hose cryptography. And this method works by capturing someone who knows the secret code, putting them in a basement and hitting them with a rubber hose until they tell you what the secret code is. And this has in the past been a quite effective method of breaking codes, which is sometimes overlooked. There's also something called a man in the middle attack. And this works as follows. What you do is you have Alice and Bob communicating with each other using what they think, using an open channel. But Eve actually runs one of the servers between Alice and Bob and is actually capable of changing the message that Alice and Bob are sending to each other. And when Alice thinks she is sending Bob her secret code, she is not, she is sending it to Eve, and Eve then changes it and sends Eve's secret code to Bob. And Bob then uses Eve's secret code to send messages to Alice, but Eve then intersects this and then re-encodes it using Alice's method and Alice gets it back. So if someone has control over one of the servers between Alice and Bob, then they can actually subvert this and read all messages between A and B, even if they're using the RSA method of encoding. Another one is users do something stupid. So Ian Castles was a number theorist and he used to work for British code breaking in the Second World War. And he said once that the British never managed to break any German code without somebody on the German side doing something really stupid that gave them a break into the code. So people doing something stupid is exceedingly common. Let me give a typical example of something stupid you could do with the RSA algorithm. Suppose you're using two primes P and Q to communicate. Well then, so Alice might be using these to communicate with Bob, but maybe she wants to communicate with someone else using a different code. So she uses two more primes and decides that finding hard primes is a lot of work. So she will just reuse the prime P and use the primes P and R in order to communicate with somebody else. Well then she publishes P times Q and P times R. And here we have two numbers that are the product of two big primes. So each of these numbers is very difficult to factorize. But if an attacker gets hold of both of these the attacker can compute the greatest common divisor of P, Q and P, R, which is just P. And factorize both of these numbers. So one stupid thing you could do is reuse large primes. You might think nobody would be that stupid, but people have in fact done things that stupid. For example, people used to use one time pads to communicate and as the name suggests you should not use a one time pad twice. If you use a one time pad twice, then it can be broken. I think the Russians were using one time pads and to economize they to use them twice occasionally after the Second World War and the Americans and sort of noticed this and managed to break the codes and catch a number of Russian spies in America by doing this. So you should never underestimate people's stupidity. I mean, as they say, people who try and devise foolproof methods always underestimate the ingenuity of fools. Next, you can just monitor traffic. So suppose Alice and Bob are planning an attack on Eve and they send this completely secure messages to Eve and Eve monitors them but is unable to decode the messages. Well, that doesn't really matter. Eve might notice one day that the number of messages between Alice and Bob has gone up by a factor of 10 and this will give Eve the information that Alice and Bob are up to something even if they don't know what is going on. So you can get advance warning of an attack just by monitoring the number of messages going on and this is why quite often military communications traffic will constantly send messages all the time even if most of these messages don't actually contain any information so that the enemy can't get information about what's going on. Another thing you can do is just direction finding. So if Alice is on a ship sending messages then Eve might not be able to decode the message but they can still tell where the message is coming from and thus get information about where the ship is. So an example of this comes from the Second World War. The Japanese, when the Japanese attacked on Pearl Harbor the Americans got no warning of this because the Japanese ships were maintaining radio silence. They weren't sending any messages at all. Furthermore, the Japanese had taken the radio operators on the ships and put them on other ships because the Americans could tell which radio monitor was which because when you were typing Morse code people in slightly different ways of tapping Morse codes. This was known as the Morse codes operators fists. So the Japanese took the Morse code operators off these ships and put them on other ships so the Americans thought all these ships were somewhere where they weren't. Another thing Eve can do, so Eve can send fake messages to A or B. I mean, if A and B have published their public keys then anyone can send messages to B purporting to come from A and there are ways of doing this because A can actually sign messages using the inverse of A's public key in such a way that nobody except A could have sent this message. So if you're feeling paranoid about Eve sending fake messages you probably want to look into this problem of signing public messages. Next you can just do social engineering which is a surprisingly common way of getting people's credit card numbers. Social engineering means you basically just ring someone up and ask them for to send them your credit card number. It's amazing how often you can just ring up a random person, say they've got a free offer or a very cheap offer for something as long as they send their credit card number and many people will simply give their credit card number over the phone to some who just rings them up. It's not just credit card numbers, they're also things like passwords. For example, you can send someone an email message saying your password has been compromised, please log on to this site to change it and of course you send them a fake site and they log on and give you that and tell you their old passwords so they can change it to a new password and then you've got their old passwords and can use that to log on to their accounts. So you've probably all had phishing messages on the internet asking you to log on to some site because your account has been compromised or you've run out of memory or something. So social engineering is a very common way of breaking codes. Another thing is that messages, at least in earlier times, often used to start with pre-set phrases. For instance, you would say, dear Admiral, what's it at the start of every message? And if you know how a message starts or how a message ends, then it can make it a lot easier to break. So all messages, at least in the Second World War, used to have to be padded with random junk. So the first 20 or so characters and the last 20 or so characters were supposed to be random and this actually caused several problems. The British at one time in the Second World War had this system that they issued people with poetry books and all messages were supposed to start with a line of poetry from this book. So this line of poetry was some sort of random junk that would not turn up in a military context. And this worked fine until the Germans figured out which poetry book the British were using and then they could break all the messages because it made it easy for them to guess the first few or the last few characters of the code. There's another rather notorious example of padded messages going wrong that happened in the Second World War. Nimitz once said to message to Halsey, wondering where his ships were. This was during the Battle of Late Gulf and Halsey had been deceived by a Japanese decoy and his fleet, which was maybe the most powerful fleet ever was heading away from the battle instead of towards it. So Nimitz sent this message to Halsey, where, repeat, where is Task Force 34, which Halsey was in charge of, meaning he wanted to know where Halsey was so we could direct him back to the battle. Unfortunately, the radio operator sending the message padded this by adding the world wonders at the end. And this should have been removed by the radio operator at the other end, but he didn't. So Halsey got the message, where is, repeat, where is Task Force 34, the world wonders, which sounded like an extremely insulting way of telling Halsey he had screwed up and Halsey apparently had a temper tantrum when he got this message. So anyway, so padding messages is no longer so useful for most modern cryptography like RSA algorithm, but probably does no harm and would be reasonable practice just in case somebody thinks it's something clever to do. By the way, another way of making messages more difficult to read is messages should be compressed. Compressing messages before sending them is good for two reasons. First of all, it just makes the message shorter. And secondly, it makes it rather hard to decode because for example, one way of decoding messages using say rather old system might be to observe that some three-letter phrase occurs quite often in the message and this might be the word third or and or something. And if you run it through a compression algorithm, it will remove things like small phrases that recur repeatedly because the compression program will kind of compress these. If you want extra security, suppose you're feeling really paranoid and are worried that say someone might manage to find a way to factorize large numbers, what you can do is you can use a different trapdoor function. There are several other trapdoor functions around based on different principles. And it might be paranoid that someone might, you know, compromise one of these trapdoor functions. Well, what you can do is just put several methods all in series. So you first encode your message using RSA, then you encode it using some sort of trapdoor function based on elliptic curves, then you encode it using some other secure function and you're going to be safe unless someone's secretly managed to break all three messages. So for those of you who are really paranoid, you can put several coding messages all in sequence. Of course, this has the problem that it becomes rather more cumbersome to use. And this by no means exhausts all methods of breaking codes. For example, your operating system might be compromised and maybe someone's installed some sort of keystroke program that records all keystrokes you type and then sends them off. And then it doesn't matter if your message is encoded using whatever security it is because the attacker has already figured out what keystrokes you're using to type your message. So, okay, I think that's pretty much all I'm going to say about cryptography. Cryptography, this is only the absolute beginning of cryptography. Cryptography is now a major subject in its own right.