 Let's code the transformer encoder. The encoder class inherits from torch modules. This allows us to define a forward pass, access parameters, allow gradient updates among many other things. In the forward pass, we pass some initial word embeddings, and this is passed through multiple encoder layers. Each encoder layer will perform attention, layer normalization, and pass it through some feed-forward layers. The outcome of this encoder is a set of word vectors that better represent the context of a word.