> and then ask it "why did you just now give that customer mountain dew when they ordered sprite?"
Worse than useless for debugging.
An LLM can't think and doesn't have capabilities for self-reflection.
It will just generate a plausible stream of tokens in reply that may or may not correspond to the real reason why.