Can we really do this reliably? LLMs are non-deterministic, right, so how do we validate the output in a deterministic way?
We can validate things like shape of data being returned, but how do we validate correctness without an independent human in the loop to verify?