 Hey, I'm Mr. Gibson here with the next lesson in cryptography. So today we're going to take a look at how we can use some mathematics specifically some statistics and probability to help us take a look at some characteristics of our text and What we call this is frequency analysis. So going back to the very first activity in the course We know that certain letters appear more often than others in the English language We're going to try and quantify that today Because having specific values are going to be very helpful for us to figure out How we can analyze cipher text more more quantitatively? So let's take a look at a few different books We're going to be working with a few books that are in the public domain Which means that we can easily pull up their entire text free of copyright So we're going to look at pride and prejudice So I ran this through some Python code and generated the following bar chart and we'll learn how to do that in this module or in this unit And this bar chart shows us that We've got a high percentage of ease that looks like somewhere between 12 and 14 percent of all letters in the book pride and prejudice are e Just under 8% or a's looks like n and o are pretty popular. So are h and i and so are r s and t And it turns out that if we were to pick a completely different book like Alice in Wonderland We see the same patterns the individual percentages might vary just a little bit But again, we see a and e are pretty high up there. So are h and i n and o and r s and t And if you go to Frankenstein again a little bit different, but most of the same patterns and little women No, different. So we're gonna see that English language as a whole has a pretty standard percentage of character distribution So we can see here This is just a table pulled off of Wikipedia that if we take a really large volume of text So maybe we take like an entire library full of books and we do this for it that the English language kind of settles down into these Percentages here so we can kind of read that fine print that a is about 8.16 7% of all letters that are in in the English language E is the highest that around 12.7 Z is pretty low at about point zero seven seven percent and we've got everything that ranges in between So this kind of bar chart on the right is what we're gonna refer to as the standard distribution of English letters And this is going to be helpful as we move forward So let's take a look at how knowing the standard distribution of the English language can help us learn a little bit about our ciphertext so here we have on the ciphertext is the Entire text of pride and prejudice run through a Caesar cipher using a key of three and I've highlighted Corresponding bars from the standard distribution so our our ease and the standard distribution are roughly 12 point something percent And we can find the corresponding bar and the ciphertext and that is mapped to H And we can see well What's what's the relationship is that those letters are three apart and if you know anything about the Caesar cipher? That shouldn't be too surprising the way that that cipher works is that we Increase the value of our plain text letter by three So if E is normally four four plus three is seven and seven maps to H and the ciphertext and we do that for all of the letters So using an additive cipher like Caesar preserves the order of our plain text when we get to the ciphertext And what I mean by that is we can see the same patterns We used to have an A and an E that were kind of relatively big spikes near each other now We have a D and an H we used to have an H and I and the plain text are in the standard distribution and now we see that at K and L and an O in the standard distribution are now at Q and R and RS and T in the plain text or the standard distribution are now at UVW in the ciphertext We can see similar patterns when if we were to use a key of seven or a key of 22 Those bars just keep shifting down wrapping around to the beginning And if what this is really helpful is that if we don't have this the plain text distribution It doesn't matter because we know the standard distribution is always going to be the same So if we were able to collect a ciphertext and we were trying to figure out What algorithm was used or what key was used we can really quickly look at this and say well Here's a ciphertext we can see those kind of order preserved spikes We can see the K and the O kind of go back to the A and the E The R and the S go back to the H and the I and so on So that's how we know it's Caesar and the fact that we know the key We can do we can figure that out by very by counting how many spots. Oh Has shifted over since we think that's from E So if if that's true if we think this high frequency spike at the letter Oh and the ciphertext maps back to E in the plain text We can just count that those are 10 units apart So that must mean that the key that was used in the Caesar cipher was 10 Now this gets a little bit trickier when we're dealing with ciphers with multiplication We do with multiplication is that we lose that order preservation of the bars We can see that here. Here's our standard distribution of characters And then here's a ciphertext created using pride and prejudice with a multiplicative key of three And we can see that while it's easy for us to identify that the M and the ciphertext likely came from the E and The standard distribution are in our plain text that that is not really enough for us to figure out the key if we didn't already know it and You can also see that even though M We think is E the other bars are kind of almost randomly scattered about they don't have the same H I spike or the NO spike or the RST spike All of those bars are really mixed in a way that makes it hard to figure it out It turns out this actually is enough for us to figure out That the key is three. We'll talk more about that in a future lesson But it does get even harder when we start looking at affine ciphers So here's a ciphertext that was created using affine with an additive key of two and a multiplicative key of seven and again We could quickly spot the likely candidate for what we think E and the ciphertext maps back to in this case Also E and the standard distribution But there's not a lot of other information just looking at this bar chart That might help us figure out what the two keys were if we didn't know them like in this situation Here's a bar chart where I didn't tell you the keys And we might be able to guess that M in this ciphertext is E in the standard distribution but That's about all we can figure out by looking at this We don't have a lot of other information that we're able to tell from the bar chart And that's how we're going to get started using frequency analysis We'll continue to build on this and find ways to to use these bar charts to help us determine the keys either Algebraically or computationally, but it's the first step in the right direction for our journey on crypt analysis Thanks for watching. We'll catch you on the next one