 Now that we know how to take our text and convert it to binary or numerical representation using ASCII or UTF-8, we're going to learn about a new category of cipher that are called stream ciphers that can make use of this binary representation. So a stream cipher is just a type of cipher, and it's pretty similar to what we've actually seen already, especially the visionare cipher, where our plain text gets converted to digits, in this case it'll be binary digits, and then combined with this pseudo-random cipher digit stream that we've been calling a key stream. So this is very similar to maybe the visionare cipher that we've learned in the past, particularly one with a one-time pad implementation, where we would pair up a letter from our plain text with a random letter from the alphabet to be added together to create the cipher text stream. However, with these stream ciphers and computers, these plain text digits are going to be encrypted one at a time with a corresponding digit from our key stream, but that random digit in practice is actually going to be a binary digit, not a random letter, so just a 0 or a 1, and instead of adding and modding by 26 or whatever the length of the alphabet was like we saw with visionare, our combining operation now will be the exclusive OR operation, or the XOR operation, that we've learned about with our binary operations in a previous lesson. Let's take a look at how this plays out. So say we have the keyword, smath, and the plain text, and see SSM capitalized. And we'll combine those together in their binary forms using the XOR operation. Remember that's that plus symbol with the circle around it. So converting both the keyword and the plain text to binary representations using the ASCII, 8-bit ASCII, we have these two representations here, and going down the line bit by bit, we perform the XOR operation on each bit in order to compute the ciphertext as binary. And if we were to take those into groups of 8-bits, or we might refer to that as 1-byte, we can convert those to their decimal representation. And lucky for us, all of these numbers have a viewable character using ASCII that we can see becomes the equal sign, the period, the 2, the single quote, and the percent symbol. So we've encrypted our message. Now one benefit of using the XOR operation, and this is part of the reason why stream ciphers use the XOR operation, is that it's very easy to reverse. Let's see how. Keeping the same keyword, smath, but now using our ciphertext, again we can convert these to their binary representations, one character at a time. And if we XOR the key stream onto the ciphertext again, so remember we got the ciphertext by XORing the key stream to the plaintext, we can now obtain the plaintext. So to undo an XOR operation with a key stream, we can just XOR by the same key stream again to recover the plaintext. So same operation, encrypts and decrypts, it's very similar to our previous algorithms where we could maybe add a key to encrypt and then subtract that key to decrypt, or multiply something and then multiply by its inverse to decrypt. But here we can do literally the exact same operation with the same key to recover our plaintext, so very handy. And we can confirm that this plaintext representation taken 8 bits at a time returns back these decimal representations which correspond back to capital N, C, S, S, M. Now one word of warning using the XOR operator in order to combine our keyword and plaintext is that we might end up with a binary stream of ciphertext where when we take it 8 bits at a time, those 8 bits don't actually correspond back to our numbers of 32 to 127, which remember are the printable characters in ASCII. So here's an example where we take the same keyword, smath, with a new plaintext, it's lowercase N, C, S, M, and we'll produce our keystream and plaintext using the same way converting each character to its 8 bit ASCII representation and then we'll XOR those together to produce our ciphertext. But now when we take those ciphertext bits 8 at a time or 1 byte at a time, we can see that each byte corresponds to a number 29, 14, 18, 7, and 5, none of which have printable characters. Remember the values between 0 and 32 are those values that would control the way that the old teletype printer would work and don't actually have a visible character that would appear on the screen. These 5 operators are GS for group separator, SO for shift out, DC2 for device control 2, BEL, that one I do know what it does, it would actually ring a bell calling the attention to the operator of the teletype machine, and then ENQ is short for inquiry. So some of these have some obvious operations that they would do on the machine. Others might require a little bit more research to understand exactly what a group separator is or inquiry means, but the point here is that we wouldn't actually be able to see these on the printout. So let's dig in a little bit deeper and figure out what might be happening when we do these with a more modern way of controlling text like Python. So if we were to try to use that CHR function that we've learned about in previous lessons to turn, say, the number 29 back into a character for ASCII, you're going to get a question mark on the screen when you ask it to print that ASCII character because it doesn't actually have a printable symbol that it can display. Now, one little note is that if you weren't actually use your keyboard and mouse to copy and paste that output, that specific question mark would retain a value of 29. I could copy and paste that somewhere else and use the ORD function to turn it back into a 29. But since all of those question marks would look the same, this is not an ideal way to manage these non-printable characters, just having a collection of random question marks we hope we put in the right order then we can remember later. This turns out there's a better way to handle these characters that aren't printable, but Python can still keep track of them for us on our string. And the way that Python does this is by keeping the original ASCII value stored in the string as an escape character. I mean, remember the escape character if you learned long ago about strings, the ones that have those backslash symbols that have a special character after it. Let's take a look at what it would look like. We can call the ASCII function in Python to display for us the character, not as it would be printed but as it would be represented using the numerical value. So here again is that 29. We're trying to convert it to a character using the CHR function. And then we throw that into the ASCII function, which we'll take a look at it and decide if it's printable, I'm going to show you that ASCII character. If it's not printable, I'm going to show you its value. Now one thing to note is that that value that it's showing you is in hex. We know that because of the X as the leading character there and then 1D. And because this is an hexadecimal where it's a base 16 number system, the one corresponds to one group of size 16 and the D corresponds to 13 groups of size 1. So the 1D is hexadecimal for 29. So again, the benefit of this is that when we print out our string, we won't have a bunch of indistinguishable question marks for these characters that we can't see, but instead will actually be displayed their hexadecimal numerical values that might help us represent them, distinguishing them from each other. Again, question marks, there might be a bunch of them and we won't be able to tell them apart, but their hexadecimal representations will at least be unique for different characters. Now where are we going with this? Our goal is to ultimately get a one-time pad stream cipher using this system. Remember we talked about the one-time pad cipher when we were discussing the vision air cipher, where we could hopefully choose just random characters, pair them up with our plain text, and then use the vision air process to go character by character to produce a cipher text that should be immune to frequency analysis. Now the issue we ran into there, and it's the same issue we'll have here now, is that it's hard to generate random digits, in this case our digits will be ones and zeros, not random letters from the alphabet, but it's very difficult to do that in a way that ensures that everybody, sender, and receiver are using the exact same series of random digits. Again, if it were truly random, we shouldn't be able to recreate them the same on both ends of the line. We should be able to get completely different digits if it were truly random. But what we do need to come up with is then a way to generate what we call pseudo-random digits, or pseudo-random characters. That way it's still seemingly random for all intents and purposes, but we can be set them up in a way where we can predictably create the same sequence of digits, both when I'm encrypting the message on my computer and then decrypting the message on somebody else's computer. Turns out Python does give us a few ways to do this, but none of them are going to be perfect. Let's take a look. The first way we'll look at is by using the secrets library. You can do import secrets, and then secrets has a method that's called randbits, and you give it how many bits of random ones and zeros that you would like for it to create. And then it would give that to you as a decimal, so we're using our friend the format function here to show this to that in binary. And we can see it does, in fact, give us 15 bits that are randomly generated by the computer. And the fact that it comes from this secrets library means that it actually is very secure. The secrets library is the way that anybody who does this for industry or business or website security, that's how they would generate random ones and zeros. However, it's not truly random. It is pseudo-random, but because its intent is for truly random-like data that we wouldn't want to be able to be recreated by somebody else, there is no way to kind of set up this number generator to produce the same stream twice. I've got my 15 bits. There's really no way for me to recreate that random 15 bits again if I wanted to use it to decrypt a message. So we're going to have to go to something a little bit less secure in order to implement our random bit selection process. The other library that comes with Python is called random. And we could use random because it allows you to set what's called a seed. And a seed is just a way to initialize your pseudo-random number generator so that you can recreate the same quote-unquote random 15 bits, or however many bits you'd like, on demand. So I'm going to set the seed to be 4,200. It will take in any numeric value. And that will set up your number generator so that when you call random.getrandbits, immediately after setting the seed, you're going to get the same 15 bits every time you run that code. So if you were to prime the random generator with the seed, generate 15 bits, you're going to get the same 110001101101 on your screen, assuming you use the same seed value. So in that way, the seed is very strongly tied to the key that you end up generating. Now this is all well and good if we have access to a modern computer in the Python programming language and we understand how to use either secrets or random to generate our random stream of bits. But that wasn't always the case. And in fact, this XOR cipher has its origins back in the 60s. So we'll need to figure out what are ways that this was done before modern computers. Now one of the things about stream ciphers and why they're used is because it actually is very simple in order to produce these random streams of bits in hardware, meaning using circuitry or computer processors without high level programming languages like we have today. And in fact, in an upcoming lesson, we'll learn how this can be done with a system that only uses logic gates and resistors and batteries, something that could have been built pretty easily back in the 1960s. And we'll see that this premise of using XOR ciphers with additional hardware in place to generate the streams of bits is what led the way for many modern encryption systems to produce these stream ciphers in the XOR operations like A5 slash 1, which has been the standard for cellular telephone security for a while. CSS, or the Content Scramble System, which is what encrypted every DVD disk ever made using a stream cipher built on an XOR operation or WEP, one of the early Wi-Fi security standards that has also been used for quite a long time built on stream ciphers with an XOR operation. All of these are able to do it without a full-blown computer, just a tiny microprocessor either in your DVD player, in your cell phone, or in the Wi-Fi router at your house. Now, one thing that these all also have in common is that they're all insecure, they've all been broken. So we'll see that these stream ciphers and this kind of way of XOR or generating the random bits has to be done just right in order to maintain security. Remember, the goal is to create a pseudo-random stream of 1s and 0s that is for all intents and purposes as good as a truly random 1 and 0 sequence like a one-time pad cipher without all the hassle that goes along with a one-time pad cipher. The fact that these have all been broken show this is really hard to do, but maybe we can learn from some historical applications to figure out how we could do better going forward. So that's it for our introduction to stream ciphers. I hope that you've taken away, is that the goal is that we can take our plaintext message, convert that to a binary representation, and then work towards creating a stream, a key stream of random or in our case pseudo-random 1s and 0s to help XOR our message bit by bit to create the cipher text message. Where appropriate, we can convert those new stream of bits back to characters using ASCII and if we end up getting an ASCII character that isn't printable, we can use the ASCII function in Python to see its hexadecimal representation instead of a question mark.