 In our previous lessons, we saw how we could use frequency analysis in the bar charts to help make some good guesses about what are likely plaintext to ciphertext mappings for the algorithms that we've seen so far. In this lesson, we're going to learn how we can automate this even further. You might have noticed that even though we have good guesses, we still have to follow up on those guesses or manually inspect the work that we've done, and it'd be great if we could just find a way to decrypt a message with one of our keys, score how well we did, and then just do that for all of the possible keys, kind of an iteration on brute forcing, but we don't actually need to manually inspect the decrypted files or the decrypted text. We can have a scoring mechanism that'll pick the most likely outcome for us, and that mechanism that we're going to look at is called chi-squared scoring. So if we have a message, we don't really even need to look at the message anymore to determine whether or not it's an English message or some message that's likely encrypted. We have these two bar charts here down below on the screen. The one on the left, we can tell because of the patterns or maybe lack of the pattern that matches the English language that it's certainly not an English language message, likely because it's encrypted. And here is a ciphertext that we're going to be working with in this particular lesson today. Notice we really can't tell anything about it. It's been stripped of all of the punctuation, all of the spacing. It is just our nice, cleaned message in blocks of five characters. So here we have some information about a potential decryption of this message. We assumed it used the affine cipher and picked a multiplicative key of 21 and an additive key of four. We can take the individual accounts of each character in the ciphertext to make the bar chart that we see on the right. Notice here we're working with the raw count as opposed to the proportion, and we'll see why we're doing that in just a moment. However, the same patterns are going to show up whether we're looking at the count or the proportion, so our typical analysis techniques will still remain. We can see here that the lack of a pattern certainly indicates that we don't have much of a chance that this is our plain text message. We don't see the telltale AE spike. We don't see the HI, the NO, the RST. None of that appears here. So how could we maybe quantify away so we don't have to sit here and look at the bar chart but instead use numbers to determine that this is not a likely decryption, successful decryption of our plain text? Well, let's take a look at how many letters there are. If we were to sum all of the letters that we saw, there were 994 in that message, so just shy of a thousand characters that we're working with here. That's going to be helpful for us in just a moment. What we're going to do is we're going to compute how many of each character that we would expect to see. Remember we know the likely chance that there's an A or a B or a C based on those proportions that we learned in a previous lesson. So for example, I know that about 8.167% of all English is the character A. So if this language or this message is 994 characters long, I can take that proportion of the text to compute how many A's I would expect to see if I had successfully decrypted this message and it was representative of the English language. I can do the same thing for all of the characters. So to estimate how many B's I would expect to see after a successful decryption, it'd be .01492 times 994. And so on for C, D, E, all the way down to X, Y, and Z. Computing that, we get our expected character counts. And this is why we're working with counts from the beginning. We want to be able to directly compare the expected number of characters we would assume that we'd see if we successfully decrypted this with the character count that we actually have from this potential decryption. We call these candidate plain text. We don't know if they're not or we don't know if they're successfully decrypted or not, but they're a candidate. Alright, so we've got our count and we've got the expected count that we would have assumed to see if we did it correctly. How does that help us? Well, we can look at this distribution here in blue of our candidate text that we've decrypted and what we would have assumed to see if we did it successful here in orange. We can see that in some places we had close values like A, but in other places we were very different, way far off. And we're going to try and quantify those differences as what we call an error. And we're going to want to look at the error, not just for one or two letters where we're closer far apart, but for all of the possible characters in our alphabet. So in this case, all 26, we're going to want to compute those differences and use them to help create a score that tells us how did we do on the whole? So let's go ahead and start computing our errors. We're going to keep building this table out to get more and more information about how our candidate plain text fared compared to what we would expect in the language language. So to compute an error for a single letter, we'll take the count that appeared in our clean or our candidate plain text and subtract off the amount of characters we would have expected to see and a successful decryption. So 69 minus 81.17998. Now we'll do the same thing for BC and onward to compute the following error values. So we can see that a we were a little under what we expected negative 12, whereas B was a little bit over with three C was a lot bigger than we expected 84 more to be exact. And so on all the way down the line. Now one thing you'll notice you might be thinking, well, I know how I can get a score for how well we did on the whole is let's just add up all of these errors. Well, it turns out due to the method that we used, those errors are only going to sum to zero. They're always going to sum to zero. And the reason why that is is imagine with 994 characters, if you over performed in one letter, it had to be balanced out by equal under performance than another. So for example, because we had less a's than we expected, that means that somewhere along the line, we had to have more letters expected than we thought we were going to have. So it might not appear all in one lump, but it looks like C is probably contributing to a lot of those positive errors. Whereas a and maybe a couple other characters are contributing to some negative errors. But because of this technique, that column will always sum to zero. Not a great score if the sum of the errors always give you the same number, especially if that number zero. So you have to do a little bit better than just summing up these errors. Now, one way that statisticians will typically deal with this issue because it shows up time and time again, not just in cryptography, is that the reason why those errors sum up to zero is because they're positive and there's some negative. But if we were to square all of those errors, so squaring our negative 12.1798 gives us a squared error of 148.352 and so on down the line, these numbers are always positive. So if we sum these up, we're now going to get a positive number guaranteed. And in fact, in this case, it's about 59,739. That is our sum of the squared errors, not a bad metric. We could use this and then maybe try different keys to get different candidate plaintext and score them the same way. And we could compare those scores, knowing that the smaller that sum of the squared errors is, the more likely it is that we guess the key is correct because we'll have the least amount of error in this method. But we could actually do a little bit better. So the way that we can do better is by trying to take into account not just the size of the error without any context, but actually factoring in what we expected to see relative to the size of the error that occurred. So let's take a look at the characters B and Y. B had an error of about three compared to the expected value that we were going to see about 14. And Y had an error of about five compared to the expected number of Ys that we were going to see of about 20. So they're roughly have about the same percentage of error, three off of 13 or five off of 20 is roughly say 25 percentish. But look at the squared errors they contribute, they're drastically different. B has a squared error of 10 whereas Y has a squared error of almost 30. What we're going to try and do is have those two letters contribute about the same total to the score since as a percentage they're about the same off. And the way that we can do that is by taking the squared error and dividing it by the expected character count that we were going to see. This is going to make both of those numbers get smaller. 10 divided by 13 will certainly get smaller. 28 divided by 20 will get smaller but it's going to get smaller by a larger scale factor. So hopefully when we're done the total squared error that we're going to create which we're now going to call the normalized squared error will be a little bit closer. Since again the total error we had relative to the expected number of characters we were hoping to see is about the same. So let's go ahead and finish out this new calculation that we're going to try and make and we're going to call it our normalized squared error. And again the way we're going to do that is we're going to take our squared error and to make sure that it's relative to the expected size we're going to divide by that expected size. So for the character A we would take 148.352 and divide by 81.17998 and then we could do the same down the line. So for B we would take the squared error of 10.046 and divide by the expected number of B's we'd see in our decrypted message of 13.8 and so when we do that we can see that we've had somewhat of a normalizing effect on our squared errors. The characters B and Y are now contributing roughly the same amount one of them is a half the other one's one and a half and that seems to have done a nice job of kind of balancing things out and if we were to total up the sum of our normalized squared errors we could have a total of about 8480. This total is what we call our high squared score for this candidate plaintext. We would need to do this quite a few times to help figure out is that a good score or a bad score. But remember the goal here is that if we successfully decrypted our ciphertext and what we had here and what we were analyzing was truly plaintext the difference between the count of each character and the expected count of each character should be very small which would lead to small errors which would lead to small squared errors and even smaller normalized squared errors. So our goal for the normalized squared errors when we sum them up to get our high squared score should be something close to zero because that's a sign that we've actually successfully decrypted our ciphertext. So to summarize to calculate the chi-squared score for a candidate plaintext you first need to choose your keys and algorithm and actually generate the candidate plaintext but then once you've got it you count each character in that message you compute the expected count of each of those characters in the message and then you're going to go about computing the normalized squared error for each one of those characters. Now we broke that into a few steps. First we computed the raw error which was the actual count that you saw in your message minus the expected count you thought you would see if you decrypted correctly. Then you square that to make sure it's positive and then you divide that by the expected number of characters you thought you would see in a decrypted message. All of that means that you've got each character and you sum up those normalized squared errors to get your chi-squared score. We're actually going to go ahead and do this now for a bunch of different key pairings for the affine decryption method to see can we figure out which plaintext candidate is the actual plaintext based off of the score alone without ever actually having to look at the messages that we're decrypting. So what we'll do now is continue on with the process of assuming that the message was encrypted with multiplicative and multiplicative keys decrypting the message using those keys to create candidate plaintext, score them, and then we're going to rank them by that score. We're not going to show the code that generated this list but you'll get to work on creating your own here in a little while. But we can see that we have a clear winner here with the lowest score of 66, not even close to the next score after it of 1312, with the additive key of 4 and a multiplicative key of 17. And we can start to see, even if we didn't look at the plaintext itself, the candidate plaintext, we probably felt pretty good that it was going to be the right choice. Although when we actually look at the text a little bit, we can confirm that in fact this lowest score candidate has some English language in there. If we were to extract it all out we'd see the following message. In a hole in the ground there lived a hobbit, not a nasty dirty wet hole filled with the ends of worms and an oozy smell, nor yet a dry bare sandy hole with nothing in it to sit down on or to eat. It was a hobbit hole and that means comfort. And you might recognize that as an opening line from the book The Hobbit. So that's it for chi-squared scoring. It's a great statistical measure to determine how similar our two distributions were. The distribution of our candidate plaintext and the distribution of the expected character count from the English language. We'll see that this chi-squared scoring method while statistically sound and works really well, does have some downsides to it. We have to be able to try all of the possible key pairings to feel confident that the lowest score is guaranteed to give us correct keys and we have to have a relatively long message. But we'll see that's pretty much true for all of our techniques.