This is unrelated to the current discussion and not meant to contradict your "Your point is moot", but instead just as a hopefully useful anecdote: In my experience, this requires the sender to choose 'Keep' too. I have been bitten several times by me sending an audio message to my wife because I was in a situation in which typing was complicated for me (outdoors, plenty of sunshine, I don't have the best eyesight), only to find that she never even got to see it because it got self-deleted after a few minutes.
My conjecture from looking at how this has worked for me is that the sender must choose 'Keep' so that the audio message stays on the receiver's phone until listened, and the recipient must choose 'Keep' so that the audio message stays on their phone after listening.
I, of course, have no proof of this other than my own experience on devices a few years old (iphones 5 and 6).