 How do we take these inputs and feed it to the transformer encoder layers? Let's blow this up. So we have the input string values and we will pad the values depending on how much we want to set the maximum input sequence length. These could be words or subwords. Each of these words or subwords are then encoded into one-hot encoding vectors. And so we'll end up overall with a tensor of the maximum sequence length cross for every single one. The max vocabulary size where one item in each case will be set to one. These are sparse values and so we will encode them into 512 dimension vectors each. We can then compute position vectors using sine and cosine functions and add them to the original embedding vectors. This will lead to 512 dimension vectors for every single word and we can feed it now to the transformer encoder for processing.