I wrote an NLP engine and want to run it against some standard benchmarks. What would be a good way to measure and compare?
No comments yet.