Check out the "static demo" pages, e.g. http://www.cs.toronto.edu/~nitish/nips2014demo/results/79133...
For this image, the University of Toronto software generates sentences like "a cow is standing in the grass by a car", whereas Rekognition only produces a ranked list of categories. ("sports_car", "car_wheel", etc.)
EDIT: this is an even better example: http://www.cs.toronto.edu/~nitish/nips2014demo/results/89407... I'm cherry-picking the cases where the algorithm does well, of course. But even if it's unreliable, the fact that this works at all is impressive.
"a man and a girl are learning to play with a small pool", while poetic, is a stretch in this case.
Anyone having any luck?
A picture of a rabbit in a wooden box => "a cat looking into a bin full of apples"
Mistaking a rabbit for a cat is not too bad. A bin is like a box, I suppose. I'm not sure where the apples came from.
Take a look here: http://learnimmersive.com
Comment: If you click on source code right now it gives me to javascript alerts that were trying to print out JSON objects.