 Hey everybody, Mr. Gibson here with your next lesson in cryptography and today we're going to be talking about how we can represent text or text characters as numerical values. So we're going to build on what we've done previously in the course where we've just dealt with the 26 capital letters in the English alphabet and number them 0 through 25 and try and find a way to bring that into the modern text space. When we deal with digital mediums like typing on a computer, there's a lot more than 26 characters that we might wish to encrypt. There's punctuation, lowercase characters, and characters from other languages. So we're going to want to talk about how to represent those as numbers. And the good thing about this is that this is actually already a solved problem. There's a lot of international standards on how to do this and we're going to look at a few of those that have developed over time. So the very first standard that came out had to do with teletype printers. So this is a old photograph here from I believe the Second World War of somebody operating one of these types of printers. So this was a mechanical device that operated with an electric signal sent down the telephone line that would receive instructions on how to operate the printer. So those instructions might be to move the printer head to a different location, type of certain character. So it was a mix of print commands for certain characters and also just controlling the movements of the printhead. So kind of for formatting like tabs and spaces and things like that. The standard that became known here is ITA2 or the International Telegraph Alphabet number two, which is a five bit encoding process. So you can see from the picture here, this would actually be done on a piece of skinny long paper. They would punch holes every five spots would be five bits. So either be a punched hole or a non punched hole and five bits be read into the machine at a time and based off of the hole punchings that would decide in the machine what to do on the printer. So for example here where we've got a hole punch, hole punch, hole punch, no hole punch, hole punch, which is essentially equivalent to binary of 11101. That would either print the character Q or the number one depending on how the machine had been set up. You can see over here there's these kind of switch commands. 11011 would switch it over to the figure printer, which would be things like symbols or numbers. And then the one next to that 11111 would switch it back to the letter configuration. So you can imagine this would this would be kind of a loud process and a long process because it'd be a room full of hole punchers printing out these commands that would be sent over the telegraph, which would then be punched on the receiving end and fed into the machine to control printer. So this was a really really kind of great process at the time. It was a way to utilize the telegraph system that was in place, but it quickly grew out of popularity as computing came around and we stopped using telegraphs and we started using internet connections. So when that became more popular, so when computers came up around in the 60s, this has become the kind of the standard for electronic communication. This is known as the American Standard Code for Information Exchange, ASCI, often pronounced ASCII. And this was a seven bit character encoding. So you can see it's got a lot more options here. There's still some legacy holdovers here from the old printers because when first computers came around they were trying to mimic what a printer would do on the screen. So you can see the first two columns here are a bunch of characters that we don't actually recognize because they're not printable characters. They're things like tabs and feed the printer forward or move the head forward, lots of things that we can actually see. But for some examples of how this worked with characters we can see, we can look at the character F, that's a recognizable one, lowercase F, that's in row six. And the 0, 1, 1, 0 are going to correspond to bits 1, 2, 3, and 4 going from right to left. So those are the first four bits that would start to recognize this character. It's also in column six. So we look up the column and we can see there's three more bits that we need. 1, 1, 0, that's bit 7, 6, 5. So we can put those together to make the number 1, 1, 0. So that was from the column. And then 0, 1, 1, 0 from the row. That's the 7-bit identifier for the lowercase character F. You'll see that there's numbers in the ASCII table. So the number 2, that character could be represented using the row 2, which is 0, 0, 1, 0. Column 3, which is 0, 1, 1. And we can combine those together to get the 7-bit identifier 0, 1, 1, 0, 0, 1, 0. This SP, that stands for a space which might be helpful up until now we've had to remove all of the spaces from our plain text before encrypting because we didn't have a numerical value for it. Now we can leave it in. And it's represented by the first four bits of row 0, so 0, 0, 0, 0. Column 2, which is 0, 1, 0. And we can get the space character now represented in binary 0, 1, 0, 0, 0, 0, 0. So a lot of different possibilities here. We're not going to have to worry about cleaning up our text anymore. If we can see it, it can show up as a binary representation. Now in practice, these 7-bit numbers are actually represented using 8 bits. We'll do that by leading the 7-bit with a 0 in front that's called padding. So we're going to pad all these numbers out to 8 bits. And that's just for predictability. We want to make sure that every number is represented using the same number of bits. So on the receiving end, you can kind of parse that out 8 bits at a time and convert that back to the corresponding character. If some were 3 bits or 5 bits or 8 bits and they all get mushed together, the receiving end, you won't know how to break them back up into the appropriate spacing. So to just make that easy and predictable, we're always going to use 8 bits for ASCII. So here's an example. Let's say we wanted to take the message go unis with the space between them. We can convert that to binary one character at a time. So we can use the table to convert each one of those characters. I've used sp here for the space. And you just concatenate them one after the other left to right, just like you would write it out on paper. What's interesting about this binary representation is that when we take these 7 bytes, a byte is 8 bits, and we put them all together, we could convert that to decimal if we wanted to. So this long decimal number here is the same thing. That one number represents the message go unis. We could also do that to hexadecimal. So we have a lot of different ways that we could represent that string of go unis in different numerical systems. And it just depends on the algorithm that we might want to use to encrypt it. Some algorithms that we're going to learn work at the binary level. Others are going to work on the decimal level, and even more will work on the hexadecimal level. So being able to convert back and forth to these different bases gives us access to a bunch of different ciphers that we might want to use. The problem with ASCII as the only representation of letters as to numbers is those control characters I mentioned earlier. It's possible that when we encrypt an ASCII message at the binary level that the cipher text, those 8 bits for each character, are going to get mapped to one of these control things, which we can't actually see. These are, again, these are holdovers from the old physical printer days where it's telling it to tab over or scroll down the page. That's going to be hard for us to transmit because we can't see these characters. So we're going to have to pair ASCII with a different representation system that's called Base64. Base64 is a six-bit system, and it's really just designed to store binary formats into readable text. It's the whole reason that it's there. This got really popular with the internet because there's a lot of non-binary messages that we like to send like pictures or anything else that is binary, but we don't have the communication system to send that. Like email, for example, does not transmit binary information, but you get pictures in your email all the time. So how do they do that? They convert the binary image into Base64 so it can be sent through the text communication channel, and your computer knows to convert it back to binary on the receiving end. So here's how this might work, combining ASCII and Base64. Say we've got the message by, capital B, lowercase y, lowercase e. We convert that to ASCII using the table we just saw, and then we'll encrypt that somehow. We don't know any of these algorithms yet, but let's just assume that this has become the encrypted bits from that message. We'll learn a few of those algorithms in the next section. The problem with those encrypted bits is that none of those 8-bit characters actually go back to an ASCII character in the table. They don't even go back to one of the control characters. There's just no 8-bit representation in our ASCII table. So the way we can fix that is to regroup those characters into 6-bit chunks, and the Base64 guarantees that any 6-bit combination has a character that we can see. So if we use that Base64 table on the previous slide, we could send the message as lowercase m, lowercase t, uppercase k, lowercase t, and then we know that the person receiving that message are going to get those 24 bits of information exactly as intended. They could hopefully then decode it, put it back into 8-bit groups, and get the original message BYE. There's a lot of other systems that are in place. The most popular one is Unicode. The first iteration of Unicode just built off of ASCII, so it's very compatible with ASCII. In fact, the first 128 characters in Unicode are the 128 characters in ASCII, but then they've built on it. So they've found a system where you could actually use 32-bits to represent certain characters, and all of those possibilities allow us to send more complicated characters than just ones that have been used historically, but things like emojis as well. So for example, the emoji on the screen is represented with those 32-bits. A Unicode character is often displayed as capital U plus, and then the unique hexadecimal that specifies which character you're talking about. But we're not going to use Unicode in this course. We're going to be using a combination of ASCII in base 64. So that's it. That's how we're going to represent our text as numbers. We've seen how we can convert any character that we can type and see into an 8-bit ASCII number, and then we've seen how we can take any combination of bits and turn those into base 64. We'll see some more details on how that conversion process works in a future lesson, but that's it for today. Thanks for watching, and we'll catch you in the next one.