https://news.ycombinator.com/item?id=12414746
https://www.insidehighered.com/news/2016/09/02/massachusetts...
Many (though certainly not all) of the instructors using Gradescope are teaching CS or Math courses with heavy enrollment. So each exam will have many submissions (even 1000+), and each submission will have a lot of short answers. Marking each one on its own is tedious, but until recently it was the state of the art for paper exams.
Instructors can and do grade essays on Gradescope, and are able to save time. But in that case the savings comes from being able to create rubrics on the fly, to change point values without re-adjusting every single marked paper, to grade across questions rather than across exams, to publish grades without having to type them all in, and so on.
There's a lot of grunt work that goes into grading, and it doesn't need to be the case :)
Also, they have a robot grading the GMAT essays since 1999 (http://www.800score.com/content/essay.html)
It seems like the AI is identifying equivalent answers among respondents. So if you mark an answer correct on one test, every other test with the same answer will be marked correct. I worked for a small competitor of this product in college and we had a lot of trouble with this problem, especially with answers that were prone to spelling mistakes, or could be written in many ways. Kudos to them for doing this well.
Another fun bit of AI in this space is in identifying where the answer key might have made a mistake. We developed some algorithms for determining the most likely answer to a problem given the responses. We never released it but I worked on a tool that would grade tests without an answer key at all. Using 50 question tests in a few freshman physics classes I was able to get the right answer a little over 97% of the time.
Amusingly, the exams always asked for a numerical answer, from which we could guess which mistake they made, then we would find that mistake in their calculation and mark it. Without that trick, identifying the specific mistake in each answer was a pretty tedious process.
It seems like they've put a lot of work towards eliminating the remaining bottleneck: actually assigning grades (which was done by a dozen profs/TAs in 10 hour marathon). Anything that automates busywork and allows teachers to focus on actually teaching sounds amazing!
Although my guess is the same thing would have happened with the human TA also.
and it's still one of the most complex problems in automated pattern recognition.
Sadly as terrible as Blackboard is, it has a large market share (it must be good on the backend stuff or something because it is terrible to use as an instructor - especially if you teach multiple sections of the same class)
This approach looks much more sane, provided they avoid any false positives slipping into a group.