1. Intro - a note on the overall problem domain - object detection in this case and bit zoomed in to the DL space. 2. Related work - work so far in the domain .. without critizin it. 3. Problem statement - what is the knowledge gap in the related work this paper is talking about. 4. Solution - how did we address the gap. 5. Validation - how do we claim our solution addressed the gap it was intended to address.
This paper's abstract covers only the last part and sporadically a bit of 2. What I want to know is this abstract is "what is the new learning in the yolov7 arch?"
Perhaps the bigger picture here is that it points to metrics chasing as a proxy for a "research agenda" in the ML community.
If you go to the associated code, you'll see that it needs a 'backbone', 'neck' etc. What is a backbone? Questions that arise directly from the code will lead you towards good blog articles, etc. https://huggingface.co/spaces/nateraw/yolov6/blob/main/yolov...
OTOH, you could go and have a look at (for instance) the Stanford vision courses for a more 'theoretical' approach. But the code itself is often solid guide to what's going on (the frameworks used for Deep Learning map well onto what's being discussed in blogs/lectures/papers).
Here's a good resource: https://eli.thegreenplace.net/2018/depthwise-separable-convo....
Anyone knows any more, maybe?
Something called YOLOv7.
Yikes. It's not clear to me if that's the upper limit on accuracy or a limit imposed by requiring that it run at 30 FPS, but still...yikes.
From the paper:
> For example, multi-object track- ing [94, 93], autonomous driving [40, 18], robotics [35, 58], medical image analysis [34, 46], etc.
LOL, these are all great use cases for a model with < 60% accuracy!
While the author likely didn't have that intention, that's what came across.
Even for YOLO meaning "You Only Look Once" YOLO and v7 do not go together well.
The point I was making is that YOLO and v7 don't go well together, and that is true for either meaning of YOLO.