If you take the generated code snippets and ask something like "There may or may not be something syntactically or stylistically wrong with the following code. Try to identify any errors or unusual structures that might come up in a technical code review.", then it usually finds any problems or at least, differences of opinion on what the best approach is.
(This might work best if you have one LLM critique the code generated by another LLM, eg bouncing back and forth between Claude and ChatGPT)