 In this video, we're going to explain the techniques that are usually used in computer systems to encode symbols. And let's start with a set of symbols that we're going to use as an example. Suppose we have the following set of symbols that are simply three colors blue, red, and green. So these are three symbols that we want to encode. Now typically what happens is that we don't want to encode a single symbol but a set of them. And it's important this notion of a set. There are two ingredients that we need to define to encode a set of symbols. Number one is we have to decide how many bits are we going to use in the encoding, the size of our encoding. And the second element, number two, is that we need to define a correspondence between symbols and the binary encoding. So let's follow again the example. Suppose this is our set, blue, red, and green. The size of the encoding you have to remember, size of encoding in terms of the number of bits, you need to remember that if I choose to use n bits, then with n bits I can have up to 2 to the n different combinations. So in this case we need to make sure that these 2 to the n is at least as big as, if not bigger, the cardinality of my set. In other words, 2 to the n, or if I decide to use n bits, 2 to the n should be larger than the number of elements in our set. In our example we have a set with three elements, and this is very easy. If I take two bits for example, and then I have 2 to the 2, which is equal to 4, and 4 is larger or equal to the cardinality of my set, which is 3. So in this case, with two bits I could encode these three symbols. I couldn't encode them with one single bit, because with one single bit I only would have two combinations, 0 and 1. And of course I could use many more bits, like 10 or 15 bits, I don't care, because as long as I have three possible combinations, my encoding is possible. So this is step number one. I already decided that I'm going to use two bits. Step number two, I need to define this correspondence between symbols and binary codes, and this correspondence typically is arbitrary. In our example we can define that we want to encode blue as 00, red as 01, green as 11, and the combination 10 in this case is unused, and it's okay. And it has to be something like this, because 2 to the 2, which is 4, gives me more combinations than the number of elements, so some of them will remain unused. This is a very simple example, but there are other types of encodings that are already used in computers, like for example when we want to encode characters. How do we encode characters? Well, it's very similar to what we did here. First we have to agree on the number of bits, and then we define a correspondence, which is typically a table, and this table is divided, each of the entries is divided into two columns. On one side of the column we have the symbol, and on the other one the 001, the binary combination. There are different types of encodings for characters. A few years ago there was this encoding called ASCII, which decided to represent characters by 8 bits, first 7 and then 8 bits. But later 8 bits is not enough, because it only allows you to represent 256 combinations, but later different encodings, precisely another one called Unicode, appeared in order to accommodate all possible symbols of all possible languages on Earth, and it had different implementations. One of them is called UTF-8, which uses 8 bits, another one is called UTF-16, which uses 16 bits, and another one is UTF-32. So the name of these three encodings tells you already the size that is being used, and all of them need, as we require here in our step number 2, a catalog to define this correspondence. So these are two examples, a trivial one and a much more realistic one, on how computers use or encode symbols. Let's not work out a totally different example. This is going to be a set of symbols that we are going to invent, and we're going to call it UAL-1. And this is the way I'm going to define. My symbols are going to be words that are made of three elements, which I'm going to write one next to the other. This first element is what I'm going to call the code, and these two other elements, which are going to be written to the right of the code, are going to be the operands, and I'm going to call operand-1 and operand-2. Now the code can have four possible values, which are add, sub, mole, and dip. In other words, in this window over here or in the first portion of my entire symbol of my set, only one of these four names or words can be there. The other two fields or the other two parts of my symbol, the operands, can have any number between 0 and 255. So it's basically a natural number between 0 and 255, both of them. So this is the definition of my set. Let's see some examples of this set that I just defined. One symbol would be add 5328. This would be one symbol of my set, one possible symbol, because there are plenty of them. Now as you can see, we recognize these three portions that I described. The first one indeed has one of these possible values. The second one is number 53, which is the number between 0 and 255. And the third one is the number 28, which again, it's a number between 0 and 255. Alright, so let's put another example. For example, mole 1211, another symbol of this set. And again, we see, as in the previous case, the three fields that we have described. Excellent. Now let's try to encode this set. And for that, we need to comply with these two requirements over here. We first need to decide how many bits are we going to use to encode this, and then the correspondence between symbols and binary encodings. Now remember, in order to decide how many bits, I first need to know how many elements are in my set. So let's do a quick calculation to find out how many elements do I have here. How do we calculate that? Well, in all my symbols, I can have four possible values here. Each one of them can have 256 possible values here between 0 and 255. And each of these combinations can have another number here between 0 and 256. So the simple formula to obtain how many symbols do I have is 4 multiplied by 256. These are all possible combinations. I can combine four symbols here, 256 here, 256 here. This is a total of, if you do the math, 2 to the 18 symbols. Now as you can see, I did it on purpose. Each one of these fields is a power of 2. So my arithmetic comes very easily because I can state for this specific set the number of elements as a power of 2. So what this is telling me, if I combine this with this formula over here, it says that I can encode the set with, attention here, at least 18 bits. I cannot use less than 18 bits because I have a total of 2 to the 18 elements in my set. Okay, so I want you to pay attention to these at least 18 bits. We haven't defined the correspondence yet. And what we're going to do is the following scheme. Point number one, we decide to use 24 bits rather than 18. But now you'll agree with me that 24 bits is at least 18. That's correct. And then we need to define the correspondence. How do we define the correspondence? We're going to give basically three rules. The first one is that add is going to be encoded. Sorry, add, sub, mold, and div are going to be encoded with two bits following these combinations. Add will be a 0, 0. Sub will be encoded as 0, 1. Mold will be encoded 1, 0. And div will be encoded as 1, 1. Then operand 1 will be encoded with 8 bits in base 2, the number. And the third operand, the third field, which is operand 2, the same 8 bits in base 2. So with these two ingredients, what I'm going to show you is that first I decided how many bits. Second, these rules over here correspond or define a correspondence between all possible 2 to the 18 symbols and their equivalent binary encodings. Let's make an example. Suppose I have the symbol, add, 53, 28. So this is a correct symbol of my set. It has the three fields that I mentioned before next to each other. And it complies with this format. Now what we are going to do is apply this correspondence rule to obtain each representation in binary. So we decided that add would be represented by 0, 0. Then we decided that operand 1 would be represented by 8 bits in base 2. Now if you do the translation of this number to 8 bits, you'll obtain 0, 0, 1, 1, 0, 1, 0, 1. Now remember, you have to reach 8 bits. That's why this encoding has these two extra 0s at the left, the most significant bits. We apply the same for 28. And what we get is 0, 0, 0, 1, 1, 1, 0, 0. So this representation over here corresponds with 28, this one with 53. And then there is one thing we haven't mentioned, which is if we go for 24 bits, we define here 2 bits, followed by 8 bits, followed by 8 bits. That gives me a total of 18 bits. But we agree that we are going to represent it with 24 bits. So we need to add one more rule or one more description to our correspondence. And this one is going to be that we'll use 6 bits, all 0, as we're going to call it filler bits. In other words, I have to make my sequence of bits all the way up to 24, which is the size that I decided in advance. Therefore, I'm just going to add here 1, 2, 3, 4, 5, 6. So these filler bits are these ones. So there we are, applying these 3 rules and the rule for bringing the representation to 24 bits. I now know how to take any symbol of my set and represent it as binary. And what is most important, I can reverse this translation. If anybody gives me this sequence of bits, then I can separate the 6 bits on the right because they are the filler bits. Take the 8 bits, 8 bits, and 2 bits, and then figure out what kind of symbol it represents. Just for the sake of simplicity, typically what happens when we manipulate these long strings of bits, we usually use its hexadecimal representation, which remember you obtained by grouping all these bits in groups of 4. So if I take the first 4, that's a 0, following 4 bits, that's the code of the D, the following 4 bits, that's the code of the 4. These 4 bits over here are represented by 7, and then the remaining bits are 2 groups of 0s, therefore it's 0, 0. So here we have established that the encoding of the symbol at 5328 of the set following the rules that we described here corresponds with the binary sequence expressed in its hexadecimal encoding 0, D4, 7, 0, 0. And as I said before, if somebody gives you 0, D4, 7, 0, 0, applying these rules in reverse order, you are able to conclude that this one is representing add 5328. And this is the way computers encode all the information they manipulate in terms of sets of symbols.