 GPT explained in 60 seconds. Transformers were introduced for sequence-to-sequence modeling. Sequence is an ordered set of tokens. In language, sentences are sequences of words. Transformers consist of an encoder and a decoder, both complex enough to kind of understand language. Stacking the encoders, we get BERT. Stacking the decoders, we get GPT. Since we need a lot of data to train any language task, GPT makes use of transfer learning. A model is pre-trained on language modeling to understand the basis of language. It can then be fine-tuned to understand specific tasks, and this fine-tuning doesn't require too much data. Instead of the fine-tuning phase, GPT2 and GPT3 use meta-learning techniques that require few examples, if any, to be able to perform a host of tasks.