 What is positional encoding and why do transformers need it? Transformers are used for sequence-to-sequence modeling. They replaced recurrent networks as the state of the art. Transformers contain two parts, an encoder and a decoder. To the encoder, word vectors are passed in simultaneously. This has the upside over recurrent networks of speeding up training. But in language, these inputs are words where the ordering matters. So when passing them in parallel, we need a way to understand the position of a word vector. To do this, for every word vector, we add a position vector of equal shape. The new vector will now have information about the meaning of a word to a certain degree and information about the position of the word, all in one vector.