 Fun facts on transformer position encodings. These positional encodings are generated from sine and cosine functions, and so they consist of numbers between negative one and positive one. They exist because the encoder takes in words simultaneously, but the ordering of these words actually matters. We have the first word followed by the second word followed by the third word and so on. This ordering or positional encoding is defined by a positional encoder. In the main paper, there are not learnable parameters, but they can be configured to be learnable parameters if need be. There is no evidence at the time of making this video to suggest that using learnable parameters is better than just using the sine and cosine functions to generate positional embeddings. I have more information on this in another video on positional encodings with code.