If the encoders know what model the decoders will be running, they can improve accuracy. You could pretty easily make a codec that doesn't encode high resolution detail if the decoder NN will interpolate it correctly.
That's changing the encoder and sure, you could do that. But that's basically a new version of the format. It's not the JPEG we're using anymore + ML in decoder. It's JPEG-ML on both the encoder and decoder side. And with the speed that we adopt new image formats... That's going to take ages :(
That makes sense if the goal is lossless compression. Since JPEG is lossy, it is sufficient to consider the Pareto front between quality, compressed size, and encoding/decoding performance.