My benchmark is the "runaway effect". At some point we'll be able to point AIs at a python environment & tensorflow and it'll improve the algorithms we have for training itself. And at some point, start suggesting improvements (or whole new designs) for AI accelerator hardware. I might be wrong, but ChatGPT makes me think we aren't far off.
I'm curious what this Diff Models paper does with tensorflow's source code. Can it already suggest improvements?