 The second of our architectures is tagging. Tagging is a supervised learning where we have a sequence of words, for example, and every word has a label on it. So we have the word I which would be then labeled pronoun. We have the word went which is labeled verb. The word to which is labeled preposition. The word and which is labeled conjunction. And one could either pre-train again using a self-supervised model where you predict the next word given the preceding words, and then take the exact same hidden state that's passed out of the neural net to the next iteration of it, and use that same hidden state into another simple neural net that predicts the output using a softmax and some other functions. Or one can directly train this up as a entirely supervised model. This is a fairly easy model to train because there's a direct mapping from each word to its hidden state, given the preceding hidden state, to the label. And this is used a lot for trying to label, for example, named entities. Is this word referring to a person? Is this word referring to a country? Is this word referring to a city? Is this word referring to none of the above? So labeling every word with what sort of thing it is.