Not arguing that; I'm just saying I don't know that KL divergence does or is responsible for this, and I haven't seen any compelling argument that increasing the KL term would fix it.
There's no question the OP found a legit issue. The questions are more like: