3Every Model Learned by Gradient Descent Is Approximately a Kernel Machine (opens in new tab)(arxiv.org)arXiv406scottlocklin5y ago107Save