This is exactly how I feel. I felt so out of my depth looking at the ML architectures and I could not make any sense of it. I thought perhaps, they get inspired by neuroscience for the layers etc.
But a friend who works on LLMs mentioned, the architecture of large ML models, are mostly experimentally discovered, not designed. If that's the case, that's even worse... it means an entire field which perhaps could replace me in future, doesn't even have a knowledge foundation for its breakthroughs, but just goes by experiment... I thought it was only the weights inside the model that evolves, not the architecture itself.
Which body of knowledge do I study then, and is it even engineering anymore? That's something else, which I am not sure if my programming experience applies.
The amount of GPU/Capital it takes to evolve such architectures, run such experiments has to be prohibitively expensive.