 I started encountering strange boxes and characters and strings of text a long while ago. I first remember seeing it on my old cell phone, you know, the kind that only made phone calls and sent text messages. I didn't pay much attention to it, but eventually from talking to my friends who had sent the messages, I began to understand that their smartphones had things called emoji and my phone couldn't receive them. As technology continued to grow around me, I started seeing those boxes everywhere. My pebble watch could not render certain characters. My Kindle sometimes gave me funnier stand-ins when it couldn't figure out what the letters should be. Websites displayed question marks in the middle of words. And I grew to accept the annoyances and inconveniences of not being able to see some characters every once in a while, and I chalked it up to my usual answer, computers. Once I started working for Typekit, I realized that folks at Adobe and Google had names for these strange little boxes. Some people called the small square tofu. It's easy to see now that I've seen it. And others, self-explanatory, the one with the X, not tofu, clearly. But really, this type of character is known as not deaf, as in not defined. In the examples I started with, the missing characters were often ancillary to the conversation, and they didn't provide much friction to understanding what was being communicated. But that's a Western, first-world, English-speaking experience. The technology around us is built by us and for us first, often with regard for other regions of the world as an afterthought. In some regions, trying to read an e-book in your native language was close to impossible. Non-Latin character support was terrible, and you'd be lucky to be able to read half the characters since most of them were those little not deaf boxes. That is, if you were able to open the file at all. Hackers are going to hack. And people took to these problems in their own hands to develop unofficial patches to the devices they were trying to use so that they could read. Think about that for a moment. They were hacking devices so they could read. This is a huge barrier to literacy. Yeah, crazy. First, if you don't speak English, to some extent, you're limited in your accessibility to articles and how-tos written about how to learn to hack advice. Then, if you don't have the tools needed for being able to hack a device, you're stuck. Finally, you get to the problem I'd like to solve today. How do you know when a font has the characters you're looking for to even patch into a hack? Needless to say, unless the problem is really burning at someone, they're not going to take the time to go through all of these hurdles. But some did. So, to the task at hand. How do you know when a font has the characters you're looking for? To understand this, we need another set of questions. What is a character? How does a computer know what to put on a screen when we ask for a letter? So when we talk about a character, we're going to oversimplify it and just think of it as a single letter, such as the letter A. For a computer to understand how to render this single character, it needs to know a little more information. First, how is that character encoded? For the simplicity of this talk, we're going to focus on one character encoding standard, the Unicode Standard. In the Unicode Standard, every single character has a unique numeric value called a code point. Lowercase letter A always has the code point of 97, no matter what font you are working with. The font file stores binary information in tables that map back to the code point, similar to a primary key, in order to give instructions to the device on how to properly render the chosen character. One of the most important tables in a font is the CMap table, or the character to glyph index mapping table. This is an index that maps the relationship between the code points for the characters contained in the fonts to where the instructions for drawing it, the glyph, live. If we can find a code point referenced in the glyph in the CMap table, we know for sure that the font will support the character. Let's explore code points a little deeper. How do we know that, how do we find that A has a code point of 97, or any other character for that matter? Unicode code points can also be represented as hexadecimal values, and you might have seen them represented before as such. For example, the hexadecimal Unicode code point for A is u plus 0061. If you can copy the character into an input box, you can search for it on file format.info, where it displays the Unicode hexadecimal code point for the character. Now that we have our code point for our character, we need a way to look up whether it exists in a font file. All of the tables and data in a font file are stored in binary. So we'll want to use a library that will do the heavy lifting of parsing that data for us. Today, we'll use a Ruby library called ttfunk to help us out, as well as some of the standard Ruby library. So let's parse an open source font. Source code bro, pro, not bro. And see if we can find a reference in the CMAP table for the letter A. Once we open the file, we'll be able to see all the tables that ttfunk has parsed for us. We want to make sure that it parsed a CMAP table for us, since that's important in determining whether the character we're looking for will be supported there. Next, we're going to assign a variable for our hexadecimal Unicode code point. We can drop the U plus that we found and just use the string value of 0061. Then we'll need to convert it to decimal in order to be able to perform our lookup in the CMAP table. Ruby's to integer method returns the string as an integer in the base given. We need a base 16 representation of the code point to properly search for our character. Now that we have our code point of 97, we could see that it's parsed, sorry, we could see if the parsed font's CMAP table has an entry for our code point that maps back to that glyph ID that we're looking for. Here we see that the glyph ID from the CMAP table is a non-zero value. The non-zero value is an important part of our search because the specification for a font file mandates that the glyph ID of zero is reserved for our friend, not def. So if we know that, that way the user will know if their font supports the character that they're looking for. To better illustrate the point, let's look at a character that is not found in most fonts. The Hurigana character of row has a code point of 12,429. When we look that code point up in the CMAP table for the font source hand serif, we find that it has a glyph ID of 1536. When we type that character row in source hand serif, we see it displayed on our computer stylized by the instructions that map back to the glyph ID in the font. In contrast, when we look up the code point for row in the font source code pro, we find that it has a glyph ID of zero. When we type the character row in source code pro, we see it displayed on our computer stylized by the instructions that map back to the not def glyph ID. So how can we use this knowledge? Now that we have a basic understanding of how a computer knows what to put on a screen when we ask for a character and how to find whether a font supports a certain character, my hope is that we no longer fear what we do not know. As creators of programs, we have the power to think about how people will use the things that we create. Understanding how the fonts we choose to display information affect people other than ourselves is an important part of our jobs. And if it's not a part of your job, you can help those around you find the fonts they need to support people around the world. Since our time here is short, I've added a small Ruby script to a GitHub repository for your own exploration. I hope you can use it to explore the magic of font files. If you want to work on the problem of making digital typography with a wide range of language support more accessible to the world, come work with me on the Typekit team. We're hiring engineers. Find me during the conference or on Twitter at Pizula. Thank you.