Yeah, I agree. I think that what3words hasn't spent enough effort on this, or perhaps is suffering from trying to cram everywhere into 3 words, which means the wordlist needs to be unmanageably large.
Even for my attempt at the problem, I did various experiments on the word list, but an ideal attempt would check for similarity across common accents, etc and I certainly wasn't able to do that.
Having said that, I think it's a valid and realistic goal for good word encoder systems to aim for good roundtripability via voice or memory.