Tell HN: Crosstalk when using Ollama with cloud DeepSeek models?

3 pointsalberto-m3mo ago1 comments

The other day I was experimenting with `deepseek-v3.1:671b-cloud` using Ollama, when a coding question got answered with a medical diagnosis on the basis of a list of symptoms. I put it down to some strange LLM hallucination, but today a Reddit thread [1] showed a similar phenomenon of an answer referring to a totally different prompt.

The only logical explanation seems to be that the server is occasionally failing to pair prompts and answers. If this is the case I think users should be aware of this problem, in addition to the other general security issues of using non-local models.

Has anyone else experienced this?

[1] (NSFW) https://old.reddit.com/r/ollama/comments/1rqoez5/so_this_has_started_happening_recently_with/

1 comments

1 comments · 1 top-level

oliver_dr3mo ago

This is almost certainly a server-side session isolation bug rather than an LLM hallucination - the model is returning a response to someone else's prompt. The DeepSeek cloud endpoints have had documented issues with request routing under load.

That said, it highlights something important: whether the failure mode is hallucination, crosstalk, or retrieval contamination, production systems need output validation that doesn't just check "did the model generate text?" but "is this response actually relevant and correct for this specific query?"

For anyone running cloud-hosted models in production (especially for anything beyond hobby use), a few practical safeguards:

1. Semantic coherence check - verify the response is topically related to your prompt. A simple embedding similarity between prompt and response catches gross crosstalk like medical answers to coding questions.

2. Instruction adherence scoring - evaluate whether the output actually follows the system prompt and user instructions. This catches both hallucinations and routing errors.

3. Ground truth verification for RAG - if you're passing context documents, verify the response is grounded in those documents and not pulling from cached state of another request.

This is exactly the class of failure that makes self-hosted models attractive for anything with real stakes. When you don't control the inference server, you inherit all its bugs. If you must use cloud inference, wrapping it with an output evaluation layer is the minimum.