 In this lesson, we'll be looking at how we can encode characters or text for digital communications with computers and other digital systems. Looking back at the history of this, it actually goes back to the 1920s with the invention of these teletype printers. The teletype printers we can see in the top right are essentially just very large typewriters, but they had the benefit of being linked to a corresponding unit that could be placed further away, maybe even several miles away, or across a country or a state that were linked by radio or telephone lines. And whatever operation was performed on one of these units would be mirrored exactly with the paired unit on the other side of the communication link. Additionally, as you typed, a record of what you typed was not only printed to the paper on both the unit that it was typed on and received on, but is also recorded on this paper tape that had holes punched into it, and the pattern of holes corresponded to the characters or operations that were sent over the radio or telephone line. This five-bit system, you can see it's got a row of very thin circles down the middle that just keeps it oriented correctly, but that up to five larger circles for each character were used. And over time, different competing standards were emerged, but ultimately won one out in the end, and it was known as the International Telegraph Alphabet Number 2, which will abbreviate as ITA2. Here you can see the different characters that corresponded to the different configurations of five bits. Now, you might realize here that with a five-bit system, you should only be able to get 32 characters, but through a nice clever use of this letters and figures flipper, we could get essentially double the amount of characters. So for example, if we were in letter mode, the letter Q, which would be a whole, whole, whole paper, whole, or maybe in binary as 11101, that would correspond to the letter Q, but if I had hit this figures key, it's almost like the shift key on your keyboard, it would tell the typewriter to shift into the figures row, which instead that Q would really be interpreted as the character or the numeral 1. So you could flip back and forth and reuse these codes just to have a different collection of characters or what they called figures, maybe we call those symbols now. You can also see on here there's some things that don't actually correspond to a character, like a carriage return or a line feed. Those are just things that could control how the typewriter would work. A carriage return would be like hitting enter, and a line feed would just kind of run the tape or run the piece of paper without any characters being printed on it. Here's a message that would use the ITA2 system. Maybe pause the video and see if you could figure out what this would correspond to. And we can see that this would correspond to our course number MA, and then the symbol there would change it over to the figure mode so that when the next signal was sent it would be a slash and not the letter X, and it would shift back into the letters mode and then CS, and then back into the figures mode and then 4 to 00. So this system was used for quite a while, but you can realize that by only using five bits, it really limits the amount of characters and symbols that were available to be sent. And as we started moving away from this more electromechanical system like the typewriter into a purely electrical system like a computer, there was no need to have this kind of hard coded limit. People started to look for a larger system that could be used, which ultimately led to the American Standard Code for Information Interchange, or ASCII. Now in this table on the right, we can see a 7-bit encoding system. So 7 bits means we can have up to 128 different commands. And these ASCII codes represented text, but also numbers and symbols. And in fact, the first two columns have nothing to do that be printed at all. They're just more elaborate controls that could be sent to the computer or the teletype printer that would use this new 7-bit system. So say you wanted to encode the letter F. And in fact, you'll notice here we have a difference between a lowercase and an uppercase F. So you would find that character in the field of our table. We'd go across the row to find the bits 1, 2, 3, and 4. So those are bits 1, 2, 3, and 4. We start with the least significant bit, or the smallest bit on the right-hand side. And then we would follow F up to the top of our table and see the bits 1, 1, 0, which are bits 7, 6, and 5. And those would go in that order to the front. To create this binary representation of the lowercase letter F, 1, 1, 0, 0, 1, 1, 0. We could do the same thing for this numeral 2. The least significant bits would be 0, 0, 1, 0, and the most 0, 1, 1, combining them together to get this number 0, 1, 1, 0, 0, 1, 0. And then likewise with the space. So this new 7-bit system allowed for uppercase characters, lowercase characters, symbols, numbers. It really greatly expanded the amount of information that was to be able to send over these new digital systems. And for a while, this 7-bit system really was sufficient. So we'll look at an example here of the 7-bit system to encode the message Go Unis. So here we have our letters all spaced out. We'll use a capital SP to represent the space between the two words. And then we could use that ASCII table to convert each of these characters into the corresponding 7-digit binary number. And if you wanted, we could convert those back to their decimal representations. It just depends on how you intend to use it, whether it be on the computer or by hand. But we'll have both representations here available to us. Now the question is, how would we now store this ASCII information? There's a couple of different ways that might seem natural, and both of these have been used just depending on the context. The first might just be to store as one long string of ones and zeros, which if we wanted to, we could convert back to the decimal representation using the int function. So we have all of those bits, which could be this really long decimal number. Or a very large decimal number. I could send that along and then somebody can convert that back to the binary and then count every 7 bits to convert back into a character. That would work. We also could use Python and just store each of these decimals or each of these binary numbers into a list. And then we have them collected nicely with a very visual way to see how they're separated. We don't actually need to think about parsing them out into 7 bits at a time. You can let the computer store them individualized for us. And we might look at which one of these is better under different circumstances further on in this course, but for now just know there are many ways that we could store our ASCII information, and the computer can work with it regardless of the way that we store. We just need to be consistent in the implementation. Now, how could we actually ask Python to do a little bit more for us? Instead of having to go into this table and look them all up, Python actually has some functions that can do that for us. The two functions are the ORD function, or ORD, and the char function, or CHR. We can see them in use here. The ORD function takes on a string of a single character. You can't give it a whole word. It has to be just one character at a time. And it will tell you the decimal representation of that number according to the ASCII table. We could go the other way. You can give the char function a numerical value, and it will tell you which ASCII character corresponds to that numerical value. In this case, that is the string lowercase o. We could combine some of our other Python functions that we've seen in recent lessons, such as the format function, in case we wanted to see what the space character was, not as a decimal, but rather as a binary representation padded out to seven characters, or seven bits, so 0100000. Or we could use our int function to take a binary string, convert that to a decimal, that we could then pass over to the char function to let us know that the string 1010101 represents the character uppercase u. And after time, other countries began to try to implement their own version of ASCII, remember the A in ASCII stands for American, but these other countries have other character needs. We might not see these very much in our English language, but having these accents and other symbols were really important for other languages to have fidelity when they represent their written word. So we had to extend out this ASCII system, and not a great name to remember, but it's the ISO slash IEC 8859-1, often just called Latin 1, is an 8-bit representation. So we expanded out from seven bits to eight bits, which effectively doubles the amount of information that you can use to encode a character. And you can see that the first 128 characters are exactly the same as ASCII. So if your eight bits just starts with a zero, the next seven bits, whatever they were in ASCII are the same that you would use in this new eight-bit system. But now, if the first bit were a one, we have access to all of these As with the accents and the Os with the accents and so on. So it was very nice forward thinking. It was backwards compatible with the existing ASCII system. And as a result of this implementation, all ASCII characters are now actually going to be represented using this and using eight bits instead. So we won't see any seven-bit ASCII really quote-unquote in the wild anymore, but we're going to use eight bits instead. And for a time, this was really great until we had even more complex character needs for our written word in the computer. And that's where Unicode comes in. Unicode is today's standard for text and other encodings. And it's for making sure that we have a consistent way across countries and cultures that we're all talking about the same written character without having to worry about a local implementation that might differ from somewhere else. So UTF-8 is the standard set by Unicode. And it's called a variable with character encoding. It means that sometimes it might use eight bits and sometimes it could use up to 32 bits depending on what the character is. And you don't have to dig into the details in this course, but it's a really smart way to do this that you can actually tell by the very first set of eight bits whether your character or symbol is going to use eight bits, 16 bits, 32 bits, 24 bits, and so on. There's a nice little system in place. But just know for the purposes of this course is that UTF-8 encoding gives us access to over a million characters, such as all of the ones that were in ASCII. And in fact, this is also backwards compatible with ASCII. So if you had an 8-bit UTF character, it would be the exact same thing as the 8-bit Latin 1 character. And it'd be the exact same thing as the 7-bit ASCII character. So all of those would return, say, the letter Q back to you with those same representations. And in fact, what Python's really doing when we use the ORD and the char functions is converting the character you gave it to a UTF value, not the ASCII value, but remember, numerically, they're going to be the same. So UTF gives us things to all sorts of different language characters. It also has emojis, the one that's up on the screen. There's a lot of different written word that is stored using a Unicode encoding. So we're going to continue to use Unicode, but it might be easier for our purposes, since we are speaking English in this course, that we can think about it as ASCII if you want, because the table is certainly easier to wrap your heads around than this complex Unicode system, and it gives us the same numerical values. So that is it for this lesson. We should now know how to take written characters from any language and emojis if we wanted, and we can convert them to a numerical representation using the UTF-8 Unicode standard, which will allow us to see them both as binary and, of course, if we wanted the decimal representations of these characters, which will help us when we start to implement systems to actually encrypt these messages. Remember, nothing we've learned about has caused these messages to be secure. We're just converting them from visual representations and characters to numerical values. Next up, we'll be using those numerical values with an encryption system to ensure that people who might intercept one of these messages don't get immediate access to our secret information.