Hello HN, author here. This post is my best attempt at visually explaining some of the leading NLP models that came up this year and some of the context surrounding them. Given BERT relies on the Transformer model, this post builds on my earlier post - The Illustrated Transformer
https://jalammar.github.io/illustrated-transformer/Hope you find it useful. Feedback is much appreciated!