The bigger problem I see is professors that think "Oh well it's sophisticated AI, it can't make a mistake" and take the results for granted.
In one of the twitter thread I saw a screenshot form a professors email where they were mentioning that "X student had 100 more eye movements than Y student" and threatened to fail the entire class.
That email blew my mind because it seemed like the professor just didn't know or didn't care that the software was the problem here. And that's the real issue.