 So one of the other important areas where mathematics can be brought to bear on a cryptographic problem involves something called the Incidents of coincidence and the idea is the following suppose I end up with some sort of encrypted message I have something that looks a little bit like this for example And I want to have some idea of what type of Encipherment has been used for this for a variety of reasons. This is not particularly important nowadays because in general We know the type of encipherment that's being used, but historically this has been an important question So there's many different types of cyphers that I could use and so one of the things I can do is to examine what's called the incidence of coincidence So the basic idea is this suppose I have some sort of text Which is written in some natural language like English for score and seven years ago and so on and so this is a natural language Text and the idea is that suppose I pick two letters from this text completely at random For example these two so here I pick this letter here and this letter here d and n and Most of the time when I pick these two letters at random they will actually be different letters But from time to time the two letters will be the same So I may pick this I here and as my second letter this I here and so every now and then I will have a Coincidence of the two letters chosen being the same And if you look at a typical English text because of the distribution of letters in the English language About six point eight percent of the time this will occur So if I pick two randomly chosen letters from an English text about six point eight percent of the time The two letters will be the same letter on the other hand if I have a text consisting of Letter chosen completely at random Which is to say that the letters are chosen with equally likely Probabilities then the incidence of coincidence will be much lower typically around three point eight percent and everything from a pure English text to some not quite completely random set of letters will give me some sort of intermediate incidents of coincidence and So this thing is actually preserved by a Substitution cipher so if a cipher text has a high incidence of coincidence It is reasonable to conclude that it was produced using a substitution cipher So how do we compute this incidence of coincidence? So suppose I have a particular letter that appears k times among n letters in my cipher text So there's n choose two n times n minus one over two different ways that I can pick two letters at random from my cipher text and There's k choose to k times k minus one over two. There's this many ways that I can pick the designated letter twice So that means if I pick two letters completely at random Then out of the this many ways that I can pick two letters at random There's this many ways that the two letters I picked were this particular letter And so that gives me this Probability k k minus one over two over and n minus one over two and after all the dust settles That's the product of k k minus one over n times n minus one now That is just one of the many letters that I have So if I want to generalize this to look at the probability that the two letters will be the same letter Without focusing on any individual letter then my incidence of coincidence is going to be found by summing up all of the probabilities of this nature and that's going to look something like this So it's going to be the sum of this product k i times k i minus one where k i is the number of times The i-th symbol appears So let's go back to our cipher text So it says horrible mess of letters and before my eyes glaze over I can try and find out how many times a Particular symbol occurs, so I'm just going to count the occurrences of letters So the letter a Appears well I've appears all of these times here and if I count the number of times the letter a appears I See that it occurs 22 times of the cipher text Likewise, I might try to count the letter b and look for the occurrences of b. There's here There's here and all together. There's a whole two times as it occurs And I can do that for all the remaining letters of the cipher text And now that I have this information I can use this frequency to compute the incidence of coincidence So again, that's going to be the product of these numbers So turns out there's 270 characters altogether. So my denominator is going to be n 270 times n minus one and My numerator is going to be the product of every one of these numbers with one less all of those product added together So I'll take 22 times 21 My next term is 2 so that's going to be 2 times 2 minus 1 and then the others Zero I'm not going to count zero times who cares what doesn't matter 19 times 1 less and so on and so forth and what I find the value here when I add those up I get about 0.0718 and that's a relatively high incidence of coincidence It says that if I pick two letters at random from this cipher text I'm going to find that they are the same letter about 7% of the time And that's high enough that it is reasonable to conclude that the cipher text was in fact produced using a Substitution cipher of some sort