 What is BERT? A stack of transformer encoders, a stack of transformer decoders, a stack of unidirectional LSTM cells, or a stack of bidirectional LSTM cells? I'll give you a sec. The correct answer is A, a stack of transformer encoders. BERT stands for Bidirectional Encoder Representation of Transformers. It is pre-trained on two tasks, mass language modeling and neck sendence prediction. And through this pre-training, BERT gains a deep understanding of the context and meaning of each word, resulting in improved word embeddings. It can then be fine-tuned on specific tasks. For more information, you can watch this video on BERT.