 attention. It involves answering what part of the input should I focus on. If we are translating from English to French, and we are doing self-attention, that is, attention with respect to oneself, the question we want to answer is how relevant is the ith word in the English sentence relevant to other words in the same English sentence. This is represented in the ith attention vector, and it is computed in the attention block. For every word, we can have an attention vector generated which captures contextual relationships between words in the sentence. So that's great!