And which alleged logic failure is that?
Especially in the context of "evaluating the performance of something".
Let's expand this a little to make it even more evident: if the task was "make a paragraph of 100 words using only 5 letter words" and an AI couldn't produce anything at all, whereas another came up with a paragraph of 100 words, except a couple of them had 6 or 4 letters, it would make absolutely no sense to rate the first as "better" than the second in performing the task.
As for understanding the task, the latter exhibits an understanding of it (since it produced a paragraph, and most of the words it used filled the criteria, which wouldn't happen if it chose them randomly), it just made a couple of mistakes (the kind of humans could easily make too in such a task). For the former we can't even be sure if it even understood the task at all.
We don't rate humans that way on performing tasks either (if they got it less than perfect it's worse than not doing it at all). Even math tests at the university level consider the approach and any partial results in the right direction, don't just mark it 0 if there's an error, nor give a higher mark to students who didn't produce anything.
The are many contexts in which correctness is important. In such contexts, an incorrect answer is often worse than an explicit non-answer.
>We don't rate humans that way on performing tasks either (if they got it less than perfect it's worse than not doing it at all). Even math tests at the university
Standardized tests often rate incorrect answers worse than non-answers, though yes a university maths test in particular isn't likely to be that sort of test.