why not just adjust the decoder / beam search to not emit any tokens that aren't semantically valid JSON?
ie. instead of using temperature to sample something from the top k most likely tokens, first exclude all the tokens that cause the output to be malformed. the model can only emit {, ", [, or a number for the first token, for example.
if someone would like a fun project to try this right away, one place to start would be to modify llama.cpp's chat example just before the line that samples tokens [1], going through `lctx.logits` to zero out invalid tokens (or these are logits, so i guess set them to -INFINITY). For smoketest, fix the first token of the model's output to "{" without any other changes and I bet you'd get something approaching JSON out.
[1]: here's the line to change: https://github.com/ggerganov/llama.cpp/blob/c4fe84fb0d28851a... see the bit on line 317-319 about how it ignores the end-of-sequence token by zeroing out the probability of sampling it? just like that!
i mean, the most principled approach probably requires some theoretic CS knowledge about regular expression derivatives or parsing machine derivatives, but i'm surprised it isn't more common to just hook into the decoder design a little, given how much we want structured data out of these models
i wish i knew how to voice my ignorant skepticism in a less disparaging way, sorry.... but i feel like a lot of this "legitimization of prompt engineering as a useful trade/practice" thinking assumes that we're trapped in the "magic circle" where the only input we have to the model is picking the prompt and the only possible output is the most likely token. but these are generative models! conditioned on their output, we have our choice about which token to accept, so why not just condition on the distribution of possible JSON output instead of the distribution of possible prose?
i suspect very quickly the most competitive prompt engineers will combine their solid understanding of theoretic machine learning and statistics with a solid understanding of computer science, perhaps even combined with a dash of persuasion / neurolinguistic programming experience. kinda worries me but it's how it is