undefined | Better HN

0 pointsakshayKMR1y ago0 comments

Haven't used vision models before, can someone comment if they are good at "pointing things". E.g given a picture, give co-ordinate for text "foo".

This is the key to accurate control, it needs to be very precise.

Maybe Claude's model is trained at this. Also what about open source vision models? Any ones good at "pointing things" on a typical computer screen?

0 comments

1 comments · 1 top-level

swyx1y ago

i mean like with everything they'll kinda be able to do it and only get really good at it if the model trainers prioritized it. see Pixmo for a recent example

j / k navigate · click thread line to collapse