I share the same experience in Japan. Once a blue moon someone will just not be able to process a white face speaking Japanese.
I think it's because languages are all about expectations. If some piece of information has primed me to hear one version of a homophone the other meanings never even pop into my head.
I've noticed a related effect while watching videos on youtube recently. I'm a native British English speaker. When I click on a random youtube clip I don't know what accent I'm going to get. If it's a strong accent I don't hear often it usually takes 5 seconds or so for my brain to 'work out' the accent. During that time I don't really understand what is being said. As soon as it clicks I can rewind the clip and hear the exact same audio with perfect understanding.
I think when the Japanese people don't understand me a similar thing is happening. They see my white face so are trying very hard to hear English. The sounds come out and are run through the 'process the English' part of their brain. Of course, they can't make heads or tails of it so end up not understanding anything.
I'm not a linguist but I'd love to know if anyone had any references on this effect.