My colleague, Rémi, created some patches that allow one to pass vLLM a JSON schema along with the prompt, which dramatically simplifies deployment of JSON-guided generation. He also added a new `serve` interface that puts it all together and makes serving such models a 2-3 line process.
Check it out and tell us what you think!