This is what the MMAP_THRESHOLD tunable solves. It makes that allocations larger than that many Bytes are served via their own mmap that can be munmapped in independence.
I use env MALLOC_MMAP_THRESHOLD_=65536 to reduce the memory-fragmentation wasted RAM of my program from 6.5 GB to 0.8 GB.
The benefit of this is that you don't have to decide at which points to call malloc_trim(). But it's expected to be a bit slower because mmap() takes a while. Choosing between malloc_trim() vs MALLOC_MMAP_THRESHOLD_ is dual to choosing between GC vs reference counting -- higher memory use for a while and having to choose when to clean up vs higher per-operation cost.