That’s exactly what Faster-RCNN does. Edit: Except for rotation — they are axis aligned bounding boxes.
Mask-RCNN (more recent) takes it a step further and also generates a per-object pixel segmentation mask, which is even better than a bounding box obviously. For that reason, Mask-RCNN is much more exciting to me, and incredibly impressive if you see examples showing what it can do.
That said, “under the hood” of Mask-RCNN are still axis aligned 2D bounding boxes for every object (and this occasionally creates artifacts when a box is erroneously too small and crops off part of an object). IMO we need to somehow get away from these AABBs, but right now methods that use them simply work the best.