 Okay. What did we learn this week? Well, we looked at time series and saw that they are sequences of variable length, which requires doing something to fit them into a neural net. A simple method that's often used is to truncate in pad to make every length the same. People sometimes do bags of words just counting how often words occur, but mostly people use recurrent neural networks. We saw that embeddings capture distributional similarity, words that show up in the same contexts, in the same documents with the same words before or after them tend to be similar. We will see a variety of different embeddings next week, more sophisticated ones. We saw several different architectures for recurrent neural nets. We noted that all of them require unrolling and back propagation through time, which can make the gradient descent a little bit slower or unstable. We saw that RNNs tend to forget so people mostly use gated recurrent units, LSTMs, or if you know the future as well as the past, by LSTMs, so one has less forgetting. We'll see methods like attention that deal with this next week. And we saw that language models predict the next word giving the preceding words are the basis for lots of networks because you can train, given lots of unsupervised data, given that you can then label sequences. Is this post friendly or hostile? Is this a purchase offer or not? One can tag the words. Is this a person? Is this a place? Is this a movie? Is this a book? One can do sequence to sequence using encoder decoder methods. This forms the basis for lots of really cool stuff we'll see next week for natural language generation, having computers that can tell stories. So there are lots of variations on the themes, but all of them tend to build upon these core recurrent neural networks.