undefined | Better HN

0 pointsjkukul1y ago0 comments

Yes, you can pre-fill the assistant's response with "```json {" or even "{" and that should increase the likelihood of getting a proper JSON in the response, but it's still not guaranteed. It's not nearly reliable enough for a production use case, even on a bigger (8B) model.

I could recommend using ollama or VLLm inference servers. They support a `response_format="json"` parameter (by implementing grammars on top of the base model). It makes it reliable for a production use, but in my experience the quality of the response decreases slightly when a grammar is applied.

0 comments

BoorishBears1y ago

Grammars are best but if you read their comment they're apparently using ollama in a situation that doesn't support them.

j / k navigate · click thread line to collapse

0 comments

BoorishBears1y ago

Grammars are best but if you read their comment they're apparently using ollama in a situation that doesn't support them.

j / k navigate · click thread line to collapse