Images to Text – Toronto Deep Learning Demos (opens in new tab)

(deeplearning.cs.toronto.edu)

75 pointsbenanne11y ago17 comments

17 comments

16 comments · 9 top-level

tly_alex11y ago· 3 in thread

Rekognition API released similar image to text API and it's much more reliable than this. At least the demo works smooth and response fast. https://rekognition.com/demo/concept

teraflop11y ago

Even leaving aside the reliability issue (which can be chalked up to the fact that this one is a demo of a non-commercial project that got overloaded), you're comparing two entirely different things.

Check out the "static demo" pages, e.g. http://www.cs.toronto.edu/~nitish/nips2014demo/results/79133...

For this image, the University of Toronto software generates sentences like "a cow is standing in the grass by a car", whereas Rekognition only produces a ranked list of categories. ("sports_car", "car_wheel", etc.)

EDIT: this is an even better example: http://www.cs.toronto.edu/~nitish/nips2014demo/results/89407... I'm cherry-picking the cases where the algorithm does well, of course. But even if it's unreliable, the fact that this works at all is impressive.

modeless11y ago

The errors are fascinating. "a cow and a car are looking at the camera." "a band plays a group of music [...]". You could almost call them metaphors instead of errors.

1 more reply

CardinalAgnelo11y ago

The demo is clearly designed for the small community of machine learning researchers to play around with it to better evaluate the papers they wrote. They aren't selling a product and probably have a hard time justifying using a lot of computing resources to host the demo. Furthermore, the models are probably optimized for result quality, not speed.

cmyr11y ago· 2 in thread

My brief survey suggests that their training sample did not include very much hardcore pornography.

"a man and a girl are learning to play with a small pool", while poetic, is a stretch in this case.

JacobEdelman11y ago

Already after 1 hour of this being posted on hn... Reminders abound of how evolution only made us good tool makers to help us to reproduce more.

dzordzduan11y ago

This is why I love hn.

YoukaiCountry11y ago· 1 in thread

So far I keep getting the error "Cannot connect to server of image2text models"

Anyone having any luck?

bootynuke11y ago

I think it must be getting slammed; I was able to get a couple of descriptions out of it, but that was balanced by probably 2 times as many instances of the above error.

finin11y ago· 1 in thread

http://www.skunkieacres.com/images/rabbit_box.jpg

A picture of a rabbit in a wooden box => "a cat looking into a bin full of apples"

Mistaking a rabbit for a cat is not too bad. A bin is like a box, I suppose. I'm not sure where the apples came from.

thomasahle11y ago

Perhaps it's been trained with pictures of apples in boxes...

JacobEdelman11y ago

Looks amazing. The fact that its just returns the "Cannot connect to server of image2text models" makes me very sad.

tonydiv11y ago

We are using this research to help people learn languages in VR.

Take a look here: http://learnimmersive.com

CardinalAgnelo11y ago

Doesn't look to be designed for a lot of traffic, be gentle.

misiti378011y ago

Very cool:

Comment: If you click on source code right now it gives me to javascript alerts that were trying to print out JSON objects.

vonnik11y ago

I'm curious to hear how much this is read as a sign of strong AI.

j / k navigate · click thread line to collapse

17 comments

16 comments · 9 top-level

tly_alex11y ago· 3 in thread

Rekognition API released similar image to text API and it's much more reliable than this. At least the demo works smooth and response fast. https://rekognition.com/demo/concept

teraflop11y ago

Even leaving aside the reliability issue (which can be chalked up to the fact that this one is a demo of a non-commercial project that got overloaded), you're comparing two entirely different things.

Check out the "static demo" pages, e.g. http://www.cs.toronto.edu/~nitish/nips2014demo/results/79133...

modeless11y ago

The errors are fascinating. "a cow and a car are looking at the camera." "a band plays a group of music [...]". You could almost call them metaphors instead of errors.

1 more reply

CardinalAgnelo11y ago

cmyr11y ago· 2 in thread

My brief survey suggests that their training sample did not include very much hardcore pornography.

"a man and a girl are learning to play with a small pool", while poetic, is a stretch in this case.

JacobEdelman11y ago

Already after 1 hour of this being posted on hn... Reminders abound of how evolution only made us good tool makers to help us to reproduce more.

dzordzduan11y ago

This is why I love hn.

YoukaiCountry11y ago· 1 in thread

So far I keep getting the error "Cannot connect to server of image2text models"

Anyone having any luck?

bootynuke11y ago

I think it must be getting slammed; I was able to get a couple of descriptions out of it, but that was balanced by probably 2 times as many instances of the above error.

finin11y ago· 1 in thread

http://www.skunkieacres.com/images/rabbit_box.jpg

A picture of a rabbit in a wooden box => "a cat looking into a bin full of apples"

Mistaking a rabbit for a cat is not too bad. A bin is like a box, I suppose. I'm not sure where the apples came from.

thomasahle11y ago

Perhaps it's been trained with pictures of apples in boxes...

JacobEdelman11y ago

Looks amazing. The fact that its just returns the "Cannot connect to server of image2text models" makes me very sad.

tonydiv11y ago

We are using this research to help people learn languages in VR.

Take a look here: http://learnimmersive.com

CardinalAgnelo11y ago

Doesn't look to be designed for a lot of traffic, be gentle.

misiti378011y ago

Very cool:

Comment: If you click on source code right now it gives me to javascript alerts that were trying to print out JSON objects.

vonnik11y ago

I'm curious to hear how much this is read as a sign of strong AI.

j / k navigate · click thread line to collapse