This is almost certainly a server-side session isolation bug rather than an LLM hallucination - the model is returning a response to someone else's prompt. The DeepSeek cloud endpoints have had documented issues with request routing under load.
That said, it highlights something important: whether the failure mode is hallucination, crosstalk, or retrieval contamination, production systems need output validation that doesn't just check "did the model generate text?" but "is this response actually relevant and correct for this specific query?"
For anyone running cloud-hosted models in production (especially for anything beyond hobby use), a few practical safeguards:
1. Semantic coherence check - verify the response is topically related to your prompt. A simple embedding similarity between prompt and response catches gross crosstalk like medical answers to coding questions.
2. Instruction adherence scoring - evaluate whether the output actually follows the system prompt and user instructions. This catches both hallucinations and routing errors.
3. Ground truth verification for RAG - if you're passing context documents, verify the response is grounded in those documents and not pulling from cached state of another request.
This is exactly the class of failure that makes self-hosted models attractive for anything with real stakes. When you don't control the inference server, you inherit all its bugs. If you must use cloud inference, wrapping it with an output evaluation layer is the minimum.