Paper: https://arxiv.org/abs/2104.05704
Blog: https://medium.com/pytorch/training-compact-transformers-fro...
CPU compute: https://twitter.com/WaltonStevenj/status/1382045610283397120
Crazy optimizations (no affiliation): 94% on CIFAR-10 in <6.3 seconds on a single A100 : https://github.com/tysam-code/hlb-CIFAR10
I also want to give maybe some better information about ViTs in general. Lucas Beyer is a good source and has some lectures as well as Hila Chefer and Sayak Paul's tutorials. Also, just follow Ross Wightman, the man is a beast
Lucas Beyer: https://twitter.com/giffmana/status/1570152923233144832
Chefer & Paul's All Things ViT: https://all-things-vits.github.io/atv/
Ross Wightman : https://twitter.com/wightmanr
His very famous timm package https://github.com/huggingface/pytorch-image-models
Though it is not groundbreaking research as of this week, I think with the pace of AI it is important to dive deep into past work and what others have tried! It's nice to take a step back and learn the fundamentals as well as keeping up with the latest and greatest.
Posted the notes and recap here if anyone finds it helpful:
https://blog.oxen.ai/arxiv-dives-vision-transformers-vit/
Also would love to have anyone join us live on Fridays! We've got a pretty consistent and fun group of 300+ engineers and researchers showing up.