 So far we've focused on representing numbers in our computer. We've used our zeros and ones to represent various integers and floating point numbers. But we'd also like to be able to implement text. We'd like to be able to read and write and interact with our computer using letters as well as numbers. But since our computer really only understands ones and zeros, we have to find a way to map the glyphs from the language that we understand into those ones and zeros that the computer can understand. We've come up with a bunch of different ways of doing this over the years, but the three most enduring methods have been ASCII, extended ASCII, and Unicode. ASCII was the first major format developed. ASCII includes several different things like uppercase English characters, lowercase English characters, digits, common punctuation that we use in English, a handful of symbols, some non-printing characters. But there's only room for 128 characters. 128 different characters is more than enough for pretty much anything you'd want to write with English. Nice thing is that this means that we can smash any individual character into just seven bits. We only need seven bits to represent a character. But you'll notice I used the word English a whole lot in there. ASCII was primarily developed by US developers with some input from British developers. So it works really well for US English. It works reasonably well for British English. There's a few things missing. We don't have a euro or a pound, but most things that they'd want to write are okay. Any other language beyond that, it's going downhill and going downhill pretty quickly. So extended ASCII provides an eighth bit. We group bits in terms of eighths to make a pie anyway, so seem reasonable to just add one more bit. This gives us another 128 characters that we could use. That allows us to group the characters from a few more languages into an extended ASCII set. It really won't get us a whole lot of characters, so we can only support a handful of extra languages with each set. So we actually end up with a series of different extended ASCII formats, where each one supports a handful of different languages that we might be interested in writing. So Western European can support English as well as Spanish, French, German. Because now we've got enough room to add things like accents, till-days, unlocks. But it's not so good for other things. There's no room to go put Greek or Cyrillic characters into there. So we have other forms of extended ASCII that do support those. On the other hand, you probably noticed that there are still some languages that are missing from here. Thai is pretty much the only thing west of the Mediterranean that's supported here. So we needed to come up with a character format that would potentially support any language. With extended ASCII, we had two problems. One was that we can't represent all languages. The other is that once you've selected a language to represent, you're really limited to just that language. Once you've selected to use the Western European version of extended ASCII, you no longer have any way to access the Greek or Cyrillic characters. Those aren't available in that character set. So the goal with Unicode was to solve both of those problems. We needed a character set that was large enough to include all of the characters from every language we might ever possibly want to write in, as well as anything else that we might want to include as text potentially. So since 8-bits was insufficient, Unicode uses more than 8-bits. There are three different major forms of Unicode. The UTF-8 format is one of the most popular because it's the smallest. Each character must have at least 8-bits to it, but it can potentially be larger. We can take up to full 32-bits if we need to, but if we can represent a character just using ASCII, then we only need one byte for that character. The UTF-16 is also a variable with format, but now we're starting with 16-bits, whereas the UTF-32 format just allocates 4 bytes for every character in your strings. This then gives us a huge amount of space to represent characters. The current Unicode format has over 125,000 different glyphs that it supports, which, as you can imagine, is enough to support Latin, Arabic, Cyrillic, Greek, Hebrew, Hiragana, Katakana, as well as other languages that you may never have heard of. You may not have any interest in ever writing in, like Cherokee or Agum. It's also room to support things like Braille as well as a ton of different symbols. We've gradually moved from systems that just supported ASCII to extended ASCII, and now most systems try to support Unicode. It makes it much easier if you want to localize your program for another language. If you want to be able to support Japanese or Chinese or Russian and English at the same time, you can do that with Unicode. You're just changing out the actual characters in your strings. You don't have to change out formats, which might actually be limited at the operating system level. You can really just switch out one set of strings for another set of strings. So most operating systems have switched over to using Unicode. Most languages are switching to use Unicode where possible, just because it makes it much easier for anybody developing for a non-English speaking language.