You can't just assert "we win" or "we lose"; you need to measure different the different possible implementations for your particular application. In the paper I wrote with Linda Torczon (cited by Russ Cox), we did exactly that. In that application (a graph coloring register allocator), we were very sparse (n was significantly less than m) and we cleared the working set quite often - the win was pretty significant.