Even with just open-source LLMs, you can see interesting differences in flagged issues when cross-validating outputs.