On the other hand, we're nowhere near human level intelligence in most tasks (Go, Poker and image classificaiton non withholding) so I can understand the argument for explainability from a practical perspective. I think we're making some good progress in that direction though, and I'll list them below:
1) Attention maps in CNNs can tell us what the net is usually looking at.
2) "Attentive Explanations" use attention mechanisms to point to the object of interest to generate an explanation for VQA tasks, check the paper (warning PDF): https://arxiv.org/pdf/1612.04757
3) A recent project used a similar explanation mechanism that forced the network to output "what it was thinking" while playing an Atari game.
4) NTMs (neural turing machines) allow weighted memory access, which alleviates the back box issue to some extent.