 Computers don't understand words. They get numbers, they get vectors, and matrices. The idea is to map every word to a point in space where similar words and meaning are physically closer to each other. The space in which they are present is called an embedding space. We could pre-train this embedding space to save time, or even just use an already pre-trained embedding space. This embedding space maps a word to a vector, but the same word in different sentences may have different meanings. This is where positional encoders come in. It's a vector that has information on distances between words and the sentence. The original paper uses a sign and cosine function to generate this vector, but it could be any reasonable function. After passing the English sentence through the input embedding and applying the positional encoding, we get word vectors that have positional information, that is context. Nice.