It's been like this for the past 10 years. Just learn the basics really well and know that stuff that's important will keep coming up over time. A ton of the breakthroughs are only relevant for a few weeks before they're superseded by a new "breakthrough".
Most of the recent work has been about scaling up transformers as much as possible and throwing them at as much data as we can, so it's mostly making transformers more efficient and distributing them across large clusters.