 Okay, one at a time. Attention. It involves answering what part of the input should I focus on? If we are translating from English to French and we are doing self-attention, that is, attention with respect to oneself, the question we want to answer is how relevant is the I-th word in the English sentence relevant to other words in the same English sentence? This is represented in the I-th attention vector and it is computed in the attention block.