> The model invents new categories (e.g. apartments) and doesn’t stick to the provided list of allowed categories
Can this specific failure mode be solved by providing a grammar that the output must adhere to? (Not sure if Qwen has this feature, it's used for eg. to ensure the output is parseable json)
Just one question. If I'm running a local model, can I do something other than just a context free grammar? Does it makes sense to have something more general, or it would be just too slow?
I guess the only hard constraint is to not have backtracking, right? To not waste previously emitted tokens
Yes, you can use constrained decoding like logit masking to force all invalid tokens in the vocabulary to -inf, and effectively be removed from selection. I believe llama.cpp exposes this by accepting a formatted grammar.