This video explains a legendary paper, BERT. It leverages the Transformer encoder and comes up with an innovative way to pre-training language models (masked language modeling). BERT has a significant influence on how people approach NLP problems and inspires a lot of following studies and BERT variants.
Code
https://github.com/google-research/bert (TensorFlow)
https://github.com/huggingface/transformers (PyTorch)