 we now turn to looking at some of the different RNN architectures that are used, particularly for natural language processing. So one choice is whether to use a simple RNN or something that is a gated recurrent network such as an LSTM. We're going to not look at that right now, but instead we'll look at a variety of architectures that depend upon what the structure of the input is and what the output is. Language models, which are the basis upon which almost everything is built, takes in a sequence of words, a subsequence, I went to the and predicts the next word, store. I went to the store yesterday. Sequence labeling takes in a sequence of words, I went to the store and puts a label shopping or travel or study or not safe for work or hate speech. We'll look at later. So a sequence to a label. It's also not uncommon in NLP to label each individual token in the sequence. So I goes to a pronoun, I went, went goes to a verb and so forth. We label each item in it. And finally, sequence to sequence. Taking a whole sequence of words, I went to the store and output a different sequence, a translation into German or Chinese or a summary or some sort of other transformation of the sequence. So we'll explore each of these in sequence as it were. So the core one language model is of the structure we've seen before. Take in the first word, take its embedding, put it into a neural net along with the null hidden state output in the state that hidden state goes the next copy of the neural net. This is unrolled in time. The embedding, the next word goes in so forth and so forth. And at the end of the subsequence, whatever it is, we then predict what the next word is going to be. So that's a language model. It's great because it doesn't require any labels. All you need is a billion words of text and you just predict the next word always, given the most recent words you've seen. So this is great because it's unsupervised or self-supervised, does not need extra labels. Given a model that you've trained with that language model, you can then take that exact same model you've already trained, run it forward over a sequence of words, get the hidden state at the edge and put that hidden state into a very simple neural net and predict a label. Is this sentence about travel? Is this sentence about shopping? Is this sentence about friendship? So it's a way to map a sequence to a hidden state and the hidden state then to a label. You could also skip the self-supervised training and go straight from sequence to label, but that's usually harder because you don't have lots of labels.