I feel like that would strike reasonable balance. Client are still simple (arguably more simple since they don't have to guess if they got everything) and the protocol is still trivial. For dynamic content, it would increase time-to-first-byte and ram usage on server, but both imho would not be an issue for the type of content gemini aims for.