 5 Concepts in Transformers 1. Look ahead mask This is one of the two types of maskings in a transformer neural network that makes sure that words cannot derive context from other words that are ahead of them. 2. Padding masks When sentences are input to a transformer, they are padded to make sure they are fixed length. The padding doesn't mean anything, and so we use the padding mask to make sure no attention is paid to these padding tokens. 3. Attention head A head that performs the attention operation. To make sure training is speedy, a transformer has multiple such heads in parallel. 4. Self-attention This is when words of a sentence pay attention to the other words in the same sentence. 5. Cross attention This is when words of a sentence pay attention to the words in another sentence, and this too is done to create word vectors that understand context. To build a transformer from scratch, do check out this playlist on the channel.