If you wanted to avoid "only latest getenv pointer per thread is valid", then the thread local data structure could be a var-name-> buffer map rather than a single reused buffer.
Worst case memory usage (all threads get all vars) is that you end up having a separate copy of the environment per thread, but it seems this is the best that can be done given the awful API.