 So last year, Meetup added support for Japanese. And this was a fascinating project in many ways. And this presentation is about one of the more interesting things that we encountered. So it begins here. We have a flow where you present you with a list of categories of Meetup groups that you might be interested in. We do this in order to find out more about you so that we can hook you up with the right Meetups. So we have arts, beliefs, book clubs, career, business, et cetera. This list is sorted in alphabetical order, which makes it easier to scan. So like everything else on our platform, we translated this step into Japanese. And as a non-Japanese speaker, very important, this looks totally fine to me. But our Japanese testers pointed out to us that it was now not in any discernible order. It was not alphabetically sorted at all. So what's going on here? Let's unpack this problem by looking at an example of sorting. So here's a basic example in Java for sorting integers. So we can take a list of unsorted ints and pass it into this sort function. And a natural sort is performed. So the numbers are now in order. Super easy. We can also do this for strings. So we create a list of unordered strings, call sort on it, and they are now naturally sorted according to their characters. So you have b before j, before m, before r, before s. Easy, right? Easy for now. Let's break this. So okay, so we're gonna take the same list of strings and we're gonna add a string starting with a capital letter. When we call sort on this new list, the capital letter string ends up in the wrong place, or at least wrong for English speakers. It should be in the middle of this list next to Morty and not before Beth. Without going too deep into it, this is because uppercase characters have a different representative value than lowercase characters and the natural sort does not sort the string starting with a capital letter the way that English speakers would expect it to. But how do we define the proper order? American English speakers in the room would expect to see capital letters sorted alongside their non-capital brethren. So how do we actually tell Java to sort it this way? Java has a class for performing language-sensitive sorts on strings. It's the collater class. What is a collater? Here's the definition from Google. Yeah. Yeah. Yeah. All right. I think we need to go one level deeper here. All right. To collate is to collect and combine text information or sets of figures in the proper order. But how does this actually apply to Java? So from the Java docs, the collater class performs locale-sensitive string comparison and you use this class to build searching and sorting routines for natural language text. Great. This sounds exactly like what we want. So how do we actually use it? Okay. So first we're gonna create a collater instance based on a locale with certain language and country information. So in this case, we're creating a collater based on the language EM, which is English, and the country US, which is the United States. So this is a collater that represents English as it's spoken in the United States. So taking the same list as before with the capital letter string added into the mix, we can pass this collater as a second argument to the sort method, which tells the sort method to use this collater as a basis of comparison. So let's take a look at the result. Hey. All right, the capital letter string is now in the right place. Good work, collaters. So yeah, let's bring this back to our Japanese example. We can define a collater based on Japanese, J-A, spoken in Japan, J-P. We grab our category names and call the sort function on those using our Japanese collater. Then we'll return the results and use this to generate the view from earlier. Hey, so a couple of the categories were swapped around. They're highlighted in red up there. So we're done, right? We're not done. We're not done at all. The Japanese testers tell us that this list is still not alphabetically ordered. Although the two that swapped are actually now correct compared to each other. So we did something. But to see what we have to do next, we need some background on Japanese itself and specifically Japanese alphabets. So there are four of them. First is kanji, which are a set of symbols shared with Chinese. Each symbol represents a concept and these concepts can be joined together to create words. The other three are phonetic alphabets, meaning that each character represents a sound. Each of these are used for slightly different things, but the most important thing to know is that our category list contains strings with all of these. But how do you actually order these? So the 50 sounds ordering is the most common system for ordering Japanese words and it's used in dictionaries among other things. The ordering of a word is based on the pronunciation of its first syllable. You can use this chart to see what the proper order should be. So starting at the top left corner and working your way down, you go a, i, u, e, o, ka, ki, ku, ke, ko, sa, si, su, se, so, and I'll spare you the rest and just say you go down the chart the rest of the way. The ordering is defined for the phonetic alphabets but not for kanji directly. The symbols can still be ordered this way though, but their order depends on the pronunciation of the word as if it were written in hiragana or katakana or one of the other phonetic alphabets. The problem though is that kanji symbols can have many different pronunciations that only a human can detect. So here's an example of the kanji representing the word today. It can be pronounced either a kyo or konichi and it all depends on the context of where the word is used and thus the way for the symbols to be alphabetically ordered depends on the context as well and there's currently no way for a computer to reliably predict the appropriate context for every word. So let's take a look at a similar concept in English. Here are two sentences with the same word obnoxiously highlighted. I would like to read a book versus have you read this book? So the word read has different pronunciations and even different meanings but it all depends on the context. This is less of a problem for English though where alphabetical order is derived purely from the characters themselves. The pronunciation doesn't really matter for this but in Japanese the ordering is much more specific to how the word is pronounced. This is well defined for the phonetic alphabets but not for kanji. So if we want to be able to use kanji to express certain ideas and content we need to get creative. So here was our solution. So we actually decoupled the display language with the language that we sorted on. Here's an example of some of our categories. So for each one we got the hiragana translation from our Japanese translators and used that to sort instead. So in total we only had about 24 categories to translate so it was actually pretty straightforward. And when determining on how to order the categories we used a sort with a Japanese collator on the translation based on the phonetic alphabet but continued to display the original Japanese copy no matter what alphabet it was in. And here's the final version. Our Japanese testers were now happy with the ordering and therefore so was I. Problem solved. But not really. Well so we managed to solve this instance of the problem but not the general one. So our solution doesn't really scale well because we still needed a human to transliterate every single word. And this becomes a huge problem when you incorporate user generated content with thousands or millions of strings that could need to be sorted this quickly becomes impossible. In my opinion the best solution would just be to avoid alphabetical sorting. Maybe there's a better way you can sort your list that just avoids this problem entirely. But there are other solutions out there. For example Amazon when you're signing up on their Japanese website asks people to provide a phonetic spelling of their name in addition to a display name and this is in order to solve problems similarly to what I've outlined. But the bottom line, looks familiar, is that language is hard. Don't design around text. Don't hinge the functionality of a page on any assumptions about how your text is laid out with regards to length or ordering. Try to say more with less. And lastly, if developing a global product is important to you, think hard about any assumptions that you're making about language and challenge them. Thank you.