 Now, in this course, we're embedding English because that's the language that you all share, but one can find embeddings for any language, Chinese, whichever. All of them involve some process of mapping from characters to words, grouped together the characters into a word. Not quite so many characters per word in Chinese, but it's the same idea. Instead of tokenization, it's called word segmenting, same idea. Or one can also use byte pair encodings in many languages, particularly the more morphologically rich languages, languages like Finnish or Turkish that have lots of endings. English has a little bit of ing or s or ly, little, if I take embeddings, you can think of m embeddings, words that have lots of morphology, not like Chinese, but richer than English, the byte pair encodings are important. But there is something else you can do that's quite cool, which is to embed multiple languages simultaneously. And that's what I want to play with right now. So the simplest way to do this is to train up a separate embedding for each language, do one for English, do one for Chinese. Now find dictionaries that map, here's an English word, here's the Chinese dictionary. Now you could just go and buy a dictionary, but people mostly take parallel corpora and English sentence. Chinese translation and automatically find these words are probably translations of each other. Now what you can do is find a projection m, a matrix m, that maps every vector embedding of each word y sub i in say Chinese, the embedding of the Chinese word, so that you project that so that it's as close as possible, minimize the Euclidean distance of that to the translation of it in English. The embedded English word, right? So English word embedded here, Chinese word embedded here, multiply by this weight here, search over all the different possible weight matrices, I'll use gradient descent to find a minimization and call the minimum one m. I now have a mapping, a projection that maps from Chinese to English. If you want to be even fancier, instead of putting this as a linear projection, you could put a whole neural net in here and do a non-linear mapping between every Chinese word and every English word, and so we'll have the property that words that are close in English and close in Chinese get mapped together, and it means you can now do really nice things like look at Chinese words in the same space as English words, which is much easier to understand what's going on when you're trying to do analyses like I often do when I'm trying to understand how is emotion expression, for example, the same in English as it is in Chinese, how is Twitter usage similar or different from Sino-Able usage by having the words from the different languages embedded in the same space, much easier to understand what's going on and to try to generalize models across languages.