YOLOv7: Trainable Bag-of-Freebies (opens in new tab)

(arxiv.org)

92 pointsgroar3y ago31 comments

31 comments

26 comments · 7 top-level

squarefoot3y ago· 7 in thread

As someone who got only his feet wet with OpenCV like 20 years ago, so basic shape recognition and no AI involved, what read/software, etc. would you suggest to catch up and play with current technology without being inundated by theory that I'm sure I couldn't grasp?

montanalow3y ago

Go to huggingface.com and start with some of the tutorials. The operational/engineering skill sets alone are all you need to treat modern ML models like any other black box API/SDK.

intpx3y ago

They call it ‘Tasks’

https://huggingface.co/tasks

Tempest19813y ago

https://huggingface.co (no 'm')

synergy203y ago

went there and there are lots of stuff indeed, but I failed to find anything related to "operational/engineering skill sets"?

mdda3y ago

To just play with something : https://huggingface.co/spaces/nateraw/yolov6 (There's an images tab, and some samples below).

If you go to the associated code, you'll see that it needs a 'backbone', 'neck' etc. What is a backbone? Questions that arise directly from the code will lead you towards good blog articles, etc. https://huggingface.co/spaces/nateraw/yolov6/blob/main/yolov...

OTOH, you could go and have a look at (for instance) the Stanford vision courses for a more 'theoretical' approach. But the code itself is often solid guide to what's going on (the frameworks used for Deep Learning map well onto what's being discussed in blogs/lectures/papers).

bj-rn3y ago

MS put up some courses on github: https://microsoft.github.io/ML-For-Beginners

https://microsoft.github.io/AI-For-Beginners/

bigdict3y ago

Start with theory you're sure you could grasp. Understand how convolutions work and that covers a good chunk of theory.

Here's a good resource: https://eli.thegreenplace.net/2018/depthwise-separable-convo....

anewpersonality3y ago· 3 in thread

We should stop calling it YOLO after the creator quit machine learning.

isoprophlex3y ago

Especially hilarious considering some other people ALSO jumped on the "we made an object detector so let's call it YOLOvX" wagon and released...

Something called YOLOv7.

https://github.com/jinfagang/yolov7

DonHopkins3y ago

Looking forward to the cat detector in YOLOv9.

binibus3y ago

Why? For me at this point YOLO means a family of detectors that in a single pass propose a bounding box per pixel and filters them with some clustering algorithm. When I see YOLOfoo I know what kind of architecture to expect. A more descriptive name like YOLO-tricks instead of YOLOvX would be nice though.

IncRnd3y ago· 3 in thread

In YOLOv7, YOLO and v7 don't go well together. No, not at all. YOLO normally means "You Only Live Once", and v7 means it's lived at least six times before this.

While the author likely didn't have that intention, that's what came across.

Even for YOLO meaning "You Only Look Once" YOLO and v7 do not go together well.

gchq-77033y ago

YOLO in this case stands for "You Only Look One".

DonHopkins3y ago

YAML originally stood for "Yet Another Markup Language" until somebody pointed out that it wasn't actually a markup language, so they retro-named it "YAML Ain't Markup Language".

IncRnd3y ago

Yes.

The point I was making is that YOLO and v7 don't go well together, and that is true for either meaning of YOLO.

1 more reply

isoprophlex3y ago· 2 in thread

Github repo mentions "teaser: Yolov7-mask" showing segmentation as well. Highly relevant to my interests. Sadly I can't easily discern any other info on this topic.

Anyone knows any more, maybe?

hwers3y ago

What are you using it for if can share? I’ve thought about training some of these and releasing the weights but I’ve never found a reason they’d really be useful personally so it never really happened

isoprophlex3y ago

I'm working on a computer vision pipeline that relies heavily on segmentation to detect objects in video feeds. We capture about 6 hours of video each day. So being somewhat close to real time with our processing rate is important ...

SrslyJosh3y ago· 2 in thread

> the highest accuracy 56.8% AP among all known real-time object detectors with 30 FPS or higher

Yikes. It's not clear to me if that's the upper limit on accuracy or a limit imposed by requiring that it run at 30 FPS, but still...yikes.

JustFinishedBSG3y ago

It's clearly the latter and I don't see why it would be "yikes". Real time detectors are useless if "real time" means 1fps.

SrslyJosh3y ago

What good is speed if the accuracy isn't significantly better than a coin flip?

From the paper:

> For example, multi-object track- ing [94, 93], autonomous driving [40, 18], robotics [35, 58], medical image analysis [34, 46], etc.

LOL, these are all great use cases for a model with < 60% accuracy!

4 more replies

sriku3y ago· 1 in thread

A rather tangential comment - this paper is an example of how NOT to write an abstract. An abstract is expected to tell me what new piece of knowledge I can learn by reading more. The content of this abstract is only 20% of what a real abstract should be .. the first half of the first sentence is almost all that's needed (could include which archa it beats). The rest of the abstract needs to cover this (perhaps one sentence each) -

1. Intro - a note on the overall problem domain - object detection in this case and bit zoomed in to the DL space. 2. Related work - work so far in the domain .. without critizin it. 3. Problem statement - what is the knowledge gap in the related work this paper is talking about. 4. Solution - how did we address the gap. 5. Validation - how do we claim our solution addressed the gap it was intended to address.

This paper's abstract covers only the last part and sporadically a bit of 2. What I want to know is this abstract is "what is the new learning in the yolov7 arch?"

Perhaps the bigger picture here is that it points to metrics chasing as a proxy for a "research agenda" in the ML community.

yeldarb3y ago

We summarized the high level improvements here: https://blog.roboflow.com/yolov7-breakdown/

kylevedder3y ago· 1 in thread

Probably the most interesting trick from the paper is using the head as a soft supervisor for earlier layers of the network, with the intuition being that if the earlier layers learn to imitate the higher capacity later layers, it frees up the capacity of the later layers to better learn the residual and provides more dense supervisory signal.

lostmsu3y ago

Yes, but to my surprise the "compound scaling" provides 3x more improvement in their ablation study. Also, I don't understand Table 8 in their ablation study for aux heads, specifically: why does it have different base benchmark values from Tables 6 and 7?

j / k navigate · click thread line to collapse

31 comments

26 comments · 7 top-level

squarefoot3y ago· 7 in thread

montanalow3y ago

Go to huggingface.com and start with some of the tutorials. The operational/engineering skill sets alone are all you need to treat modern ML models like any other black box API/SDK.

intpx3y ago

They call it ‘Tasks’

https://huggingface.co/tasks

Tempest19813y ago

https://huggingface.co (no 'm')

synergy203y ago

went there and there are lots of stuff indeed, but I failed to find anything related to "operational/engineering skill sets"?

mdda3y ago

To just play with something : https://huggingface.co/spaces/nateraw/yolov6 (There's an images tab, and some samples below).

bj-rn3y ago

MS put up some courses on github: https://microsoft.github.io/ML-For-Beginners

https://microsoft.github.io/AI-For-Beginners/

bigdict3y ago

Start with theory you're sure you could grasp. Understand how convolutions work and that covers a good chunk of theory.

Here's a good resource: https://eli.thegreenplace.net/2018/depthwise-separable-convo....

anewpersonality3y ago· 3 in thread

We should stop calling it YOLO after the creator quit machine learning.

isoprophlex3y ago

Especially hilarious considering some other people ALSO jumped on the "we made an object detector so let's call it YOLOvX" wagon and released...

Something called YOLOv7.

https://github.com/jinfagang/yolov7

DonHopkins3y ago

Looking forward to the cat detector in YOLOv9.

binibus3y ago

IncRnd3y ago· 3 in thread

In YOLOv7, YOLO and v7 don't go well together. No, not at all. YOLO normally means "You Only Live Once", and v7 means it's lived at least six times before this.

While the author likely didn't have that intention, that's what came across.

Even for YOLO meaning "You Only Look Once" YOLO and v7 do not go together well.

gchq-77033y ago

YOLO in this case stands for "You Only Look One".

DonHopkins3y ago

YAML originally stood for "Yet Another Markup Language" until somebody pointed out that it wasn't actually a markup language, so they retro-named it "YAML Ain't Markup Language".

IncRnd3y ago

Yes.

The point I was making is that YOLO and v7 don't go well together, and that is true for either meaning of YOLO.

1 more reply

isoprophlex3y ago· 2 in thread

Github repo mentions "teaser: Yolov7-mask" showing segmentation as well. Highly relevant to my interests. Sadly I can't easily discern any other info on this topic.

Anyone knows any more, maybe?

hwers3y ago

isoprophlex3y ago

SrslyJosh3y ago· 2 in thread

> the highest accuracy 56.8% AP among all known real-time object detectors with 30 FPS or higher

Yikes. It's not clear to me if that's the upper limit on accuracy or a limit imposed by requiring that it run at 30 FPS, but still...yikes.

JustFinishedBSG3y ago

It's clearly the latter and I don't see why it would be "yikes". Real time detectors are useless if "real time" means 1fps.

SrslyJosh3y ago

What good is speed if the accuracy isn't significantly better than a coin flip?

From the paper:

> For example, multi-object track- ing [94, 93], autonomous driving [40, 18], robotics [35, 58], medical image analysis [34, 46], etc.

LOL, these are all great use cases for a model with < 60% accuracy!

4 more replies

sriku3y ago· 1 in thread

This paper's abstract covers only the last part and sporadically a bit of 2. What I want to know is this abstract is "what is the new learning in the yolov7 arch?"

Perhaps the bigger picture here is that it points to metrics chasing as a proxy for a "research agenda" in the ML community.

yeldarb3y ago

We summarized the high level improvements here: https://blog.roboflow.com/yolov7-breakdown/

kylevedder3y ago· 1 in thread

lostmsu3y ago

j / k navigate · click thread line to collapse