I built a straightforward demo to showcase Gemini's object detection capabilities. Upload any image, get bounding boxes and clean JSON output. It's currently the only model I've found that can accurately return object bounds.
Just been playing around with the bounding box feature myself - do you mind sharing how you figured out how to translate the bounding boxes back to the correct dimensions/position on the original image?