Faster R-CNN: Down the rabbit hole of modern object detection (opens in new tab)

(tryolabs.com)

113 pointsvierja8y ago10 comments

10 comments

10 comments · 5 top-level

nnq8y ago· 3 in thread

wansn't R-CNN already superseded by YOLO[1]? didn't read the article, but no mention of it to compare itself to, so seems outdated maybe.

anyone had the time to dig deeper into this?

[1] https://pjreddie.com/media/files/papers/yolo.pdf

eggie58y ago

Tradeoffs: RCNN has better accuracy. YOLO is faster.

pilooch8y ago

rcnn is two steps and ssd is single step.

bitL8y ago

Take a look at SSD instead; it seems to be more precise than YOLO and a bit faster. R-CNN variants are usually 10x slower than either of these two.

rambossa8y ago· 1 in thread

Does anyone try to get accurate bounding boxes (rotation, correct angle) with these object detection models? Or does the greatly harden the problem?

electrograv8y ago

That’s exactly what Faster-RCNN does. Edit: Except for rotation — they are axis aligned bounding boxes.

Mask-RCNN (more recent) takes it a step further and also generates a per-object pixel segmentation mask, which is even better than a bounding box obviously. For that reason, Mask-RCNN is much more exciting to me, and incredibly impressive if you see examples showing what it can do.

That said, “under the hood” of Mask-RCNN are still axis aligned 2D bounding boxes for every object (and this occasionally creates artifacts when a box is erroneously too small and crops off part of an object). IMO we need to somehow get away from these AABBs, but right now methods that use them simply work the best.

BillyParadise8y ago· 1 in thread

Is this what they use for self driving cars?

bitL8y ago

Faster R-CNN gives you only like 5fps on high-end GPU, so answer is no.

nicodjimenez8y ago

Object detection is an interesting failure for deep learning. Systems such as these perform well but whenever you have something like non max suppression at the end you are bound to get hard to fix errors. I'm more optimistic about deep mask and similar pixel wise approaches as well as using RNNs to generate a list of objects from an image.

swframe28y ago

I saw this today: https://github.com/facebookresearch/Detectron

j / k navigate · click thread line to collapse

10 comments