 What is the evolution of language modeling? We have recurrent neural networks to help process sequence data. The best ones were LSTM cells for understanding longer sequences and bi-directional recurrent networks for understanding bi-directional context. Then transformer neural networks uses attention for sequence-to-sequence modeling. BERT and GPT, which are a stack of transformer encoders and decoders respectively, which are pre-trained on language tasks and fine-tuned with less data, they were introduced to explicitly solve language tasks without large data requirements. GPT 2 and 3 instead of being fine-tuned can just be tuned with just zero examples, one examples, or a few examples with meta-learning. Instruct GPT and chat GPT are fine-tuned GPT models that also make use of reinforcement learning to incorporate human feedback. Super excited for the future.