The disquieting factor is that the network is identifying race of patients using signal that humans can't see. It's one thing if there are actual biological differences that should be factored into treatment... It's another if the AI is observing race differences that a human can't explain but as a result the AI might give different answers to other questions because it's factoring in a race signal that we can't see. We wouldn't trust "If this patient were white, I'd give this answer, but since this patient is black I'm going to give that answer... Can't tell you why, my experience just indicates to me that's correct" coming from a human doctor, so we definitely won't accept it coming from a machine. Not without a concrete reason the answer should be different if the patient is black.
If I understand correctly, the paper's recommendation is for follow-up research to understand what signal the network is actually keying in to. There might be an actual biological difference. This paper wasn't able to identify it.