I’m using it through Kagi which doesn’t use Deepseek’s official API [1]. That limitation from the docs seems to be everywhere.
In practice I don’t think anyone can economically host the whole model plus the kv cache for the entire context size of 128k (and I’m skeptical of Deepseek’s claims now anyway).
Edit: a Kagi team member just said on Discord that they’ll be increasing max tokens next release
[1] https://help.kagi.com/kagi/ai/llms-privacy.html