YOLOv6: Redefine state-of-the-art for object detection (opens in new tab)

(dagshub.com)

138 pointsnpRandom3y ago61 comments

61 comments

50 comments · 19 top-level

klohto3y ago· 8 in thread

> it's important to note that MT-YOLOv6 is not part of the official YOLO series

I don't understand the logic behind it. Since you're not part of the official series, why create the confusion? Now the "official" YOLO will release v6 and then what? Or do you expect them to skip it because you already made v6?

To me, it seems disrespectful, just add a different suffix.

garblegarble3y ago

>Now the "official" YOLO will release v6 and then what

I agree that it's disrespectful, although FYI the 'official' YOLO is done - the author, Joseph Redmon, has quit the field[1] because of the military and privacy concerns of CV.

1: https://twitter.com/pjreddie/status/1230524770350817280

black_puppydog3y ago

Not only did he quit CV, he also seems to have fun outside the field: https://twitter.com/pjreddie/status/1504180525656801280

Good on him! :)

1 more reply

amelius3y ago

> because of the military and privacy concerns of CV.

If all the morally good people quit a field, and we're left with only the morally bad people, is that a good thing?

3 more replies

npRandomOP3y ago

I agree. The name YOLO is heavily amused (A good example: https://github.com/jinfagang/yolov7). However, you should note that the research team of YOLOv5 is also not the original one. As @garblegarble mentioned, the original research group stopped working on it (https://news.ycombinator.com/item?id=31918087).

basedbertram3y ago

> But note that YOLOv7 doesn't meant to be a successor of yolo family, 7 is just a magic and lucky number. Instead, YOLOv7 extend yolo into many other vision tasks, such as instance segmentation, one-stage keypoints detection etc..

Are you kidding me?

1 more reply

echelon3y ago

At this point so many different people have reused "YOLOv{n}" that the lineage is broken. I don't know how or why it happened, but it continues to be a thing.

chriskanan3y ago

I assume this happened because Joseph Redmon, the creator of v1 - v3, left the field of computer vision [1]. So then other people just took (stole?) the name.

[1]. https://syncedreview.com/2020/02/24/yolo-creator-says-he-sto...

1 more reply

lamfm953y ago

Have you read the article or just tried to be the 1st comment? They did mention the reason, which seems sensible to me.

singularity20013y ago· 7 in thread

is there any progress on combining object detection with object segmentation? So instead of boundary boxes we get the true shape of objects? I know segmentation exists, just wondering about integration with Yolo or similar.

Q6T46nT668w6i3m3y ago

Yes, look into instance or panoptic segmentation. The most popular method is a region-based network that jointly regresses bounding box coordinates alongside an object mask and class label.

singularity20013y ago

Thanks. The next step would be combining it with text-image foundation models such as clip https://github.com/openai/CLIP so that the model no longer depends on a limited set of predefined labels (coco…), right?

Also occlusion inference would be fantastic, so that we can select between the visible parts of the object and the whole shape (behind trees etc).

Exciting decade.

yeldarb3y ago

Yes, this is called "instance segementation". There's a YOLO-based instance segmentation model called YOLACT.

singularity20013y ago

Wonderful, thanks.

It says backbone: Resnet101-FPN in https://github.com/dbolya/yolact ?

Anyone else looking for a pip installable solution: I found https://github.com/ayoolaolafenwa/PixelLib

And most current: https://github.com/yeliudev/catnet

dimatura3y ago

This task is called instance segmentation and is an active research topic. Mask-RCNN is relatively old these days, but still might be the most popular approach. There also happens to be a few approaches for the task taking inspiration in methodology from YOLO, e.g. YOLACT (which clearly also pays homage in name).

genewitch3y ago

Is this "rotoscoping"?

klysm3y ago

In real time?

formerly_proven3y ago· 4 in thread

Is this another fraud like YOLOv5?

rjdagost3y ago

Calling YOLOv5 a "fraud" is a bit harsh. It has many excellent aspects for practitioners: easy to use, fast inference time, scalable model architecture, and it has many helpful utilities built-in for model deployment. In my experience, in real use-cases the models achieve about the same precision / recall / mAP as well as "state of the art" methods that report better stats on benchmarks.

saynay3y ago

All YOLOs past v3 will be, to some extent, since the original author wont be releasing anymore versions. This one looks to be in pytorch, so it is also not in darknet. I am not really sure what makes it "YOLO" anymore other than being a single-shot detector, but the author claims they took inspiration from the techniques in the original YOLO papers.

npRandomOP3y ago

I compared YOLOv5 and v6 on several images, and v6 outperformed v5 by ~10% in the confidence level of the labels.

eis3y ago

Comparing confidence metrics of the networks themselves is like comparing two athletes by asking them each how good they are and declaring athlete B the winner of the race because he thought he was better than athlete A thought about himself.

turdnagel3y ago· 4 in thread

With graphics cards prices coming down, I'm considering purchasing one to mess around with GPU-based ML. Is a model like YOLOv6 runnable on a modern single GPU? If so what would get me the best bang for my buck?

sdlion3y ago

For training, more GPU RAM will allow you train with greater resolutions in less time and better performance. Before feeding data to a model, it needs to be resized to a "network dimension" (YOLOv4 default is 416x416 px if I recall correctly). For training, it will group several samples and train with them at the same time, in "batches". For better generalization you want bigger batches (so more different images are feed at the same time). With a 3060 (non-Ti) you'll have 12GB of GPU RAM, with that I think you can run the default settings (network size, batch size and subdivision of batches) for the YOLOv4 model. If you want to go to 512px, you might have to increase the subdivision (create more subbatches) or reduce the batch size. If I recall correctly, you could find 3070 with less than 12GB of RAM, so in trying to purse faster training times (I'm not talking about inference, using the model to actually recognize something) you might not be able to train with a broader range of options that can improve your accuracy.

KingOfCoders3y ago

I'm training YOLO on a 2080TI works fine but YMMV. Waiting for a 4080(TI).

genewitch3y ago

Relatedly someone linked a "all in one ML imaging software" called chaos vision or something, and the first thing on the linked page was "you probably need an Nvidia rtx 3090, or another Nvidia card with 24GB of memory"

I've tested a 'machine vision for image tagging' self-hosted service and it seemed reasonably responsive, CPU only, too - but I ran a pre-trained model for that.

ekleraki3y ago

I would wait for the 4000 series from nVidia, which should be this fall, and then making a purchase. The best bang for your buck will likely be a 4070, or a discounted 3080.

franciscop3y ago· 3 in thread

Is this available for https://coral.ai/ somehow (USB accelerator)? Would it be difficult to convert it? I've played with the USB accelerator and it's cool, but would love to use some of these better algorithms since I found the default available ones were lacking.

joshvm3y ago

You can use yolov5 - here's a repo I made for an Ultralytics competition: https://github.com/jveitchmichaelis/edgetpu-yolo

Note: typical constraint is RAM and changes to the EdgeTPU compiler which now fails to convert larger models. Previously (version 15?) it would delegate layers to the CPU, but now it just doesn't work at all for large input sizes.

Also while it works, I think it's unlikely to be much better than a well trained mobilenet SSD. The advantage is you can train in pytorch and go from there, training quantised/edge models in Tensorflow is tricky.

codeinassembly3y ago

It is available via ONNX, which can convert to tensorflow weights. From there, it's possible to perform post-training quantization and it should finally be available for use with Coral. However, there's a good chance operations aren't yet supported by their chip, and accuracy will surely take a hit.

eis3y ago

I see you are working at "dagshub". Maybe you can let the people know that it's not a good show to create fake accounts here to push the story and leave useless praising comments.

bArray3y ago· 2 in thread

https://github.com/meituan/YOLOv6/blob/main/docs/About_namin...

> P.S. We are contacting the authors of YOLO series about the naming of YOLOv6.

You should ask _before_ publishing, not _after_.

They claim it runs faster and is more accurate than YOLOv5, yet requires 3x as much computation (GFLOPs)? Something doesn't add up here.

There is unbelievably little information about the architecture too. Unfortunately it's not in a format I can easily throw the cfg in as visualize it: https://gitlab.com/danbarry16/darknet-visual

This appears to be on purpose to advertise DagsHub: https://dagshub.com/pricing

saynay3y ago

The individual ops could be faster, so even though there are more of them, the overall speed is quicker. The authors mention using more 16bit ops, so that might be part of the reason?

ur-whale3y ago

> There is unbelievably little information about the architecture too

Fair enough, but the repo at git clone https://dagshub.com/nirbarazida/YOLOv6 seems to contain somewhat standard torch code.

So it's not exactly like the architecture is a secret.

The weights is a different story, of course.

amelius3y ago· 2 in thread

They cut the y-axis in the graph. The improvement is less dramatic than they want to make it seem.

rjdagost3y ago

Welcome to computer vision / machine learning research!

toxik3y ago

It is absolutely not common in CV or ML to use such underhanded tricks. Peer review is allergic to it.

2 more replies

notme12343y ago· 1 in thread

What is the best object detection that is able to run on Raspberry PI 2? I checked tiny Yolo which requires less resources but it is much less accurate than the regular Yolo. Looking for the best accuracy with the least CPU/RAM requirements. Mainly needs this to tell if an image contain a person (still images from a network camera, which publishes an image upon motion detecion).

sdlion3y ago

It would seem that it depends on the architecture you will be using. Whether is a ARM, GPU, mobile GPU processor, etc. This is comment from the author of YOLOv4 mentions that NanoDet is more suitable for ARM-CPU's https://twitter.com/alexeyab84/status/1436377831974506496

martingoodson3y ago

Just to clear things up: Joseph Redmon (who made the first YOLO) has anointed Alexey Bochkovskiy as the keeper of the flame [1]. Alexey is a very careful researcher and does a ton of performance evaluation on his models. His results are to be trusted.

He gave a great talk at the London Machine Learning Meetup in April, if you’re interested [2]. (Full disclosure: I run the meetup)

[1] https://mobile.twitter.com/pjreddie/status/12538910781821992...

[2] https://youtu.be/nxOzeTmqe3Y

hvdijk3y ago

> We can clearly see that YOLOv6s detects more objects in the image and has higher confidence about their label.

If actually look at the images they provide directly above:

In the first image, the older one detects one extra tie. In the second image, the objects detected are the same. In the third image, the older one detects a stop sign, and this new network (no, let's not call it YOLOv6) appears to get confused by the two cars behind one another and detects two objects, but the bounding boxes of the objects includes both, it doesn't look like it actually separates them. But to be fair to them, they do detect an additional person on the left.

rocauc3y ago

Joseph Redmon was the original author of the YOLO family of models, up through YOLOv3. A maintainer of Darknet, Alexey Bochkovskiy (the framework for the original three YOLO models), published YOLOv4. Glenn Jocher used “YOLOv5” and showed the ML community that you can, but not without controversy[1], force a name into existence with adoption.

It appears the authors of YOLOv6 are aiming to employ a similar clever naming strategy.

I’m looking forward to more benchmarks before getting too excited.

[1] https://blog.roboflow.com/yolov4-versus-yolov5/

stathibus3y ago

If yolov6 is really a significant improvement, the article does a very poor job explaining it, but does a decent job of hyping it up for people who know nothing about the field. Who is the audience here?

isaacfrond3y ago

You can try version 5 on your iphone in realtime.

https://apps.apple.com/us/app/idetection/id1452689527

sdlion3y ago

There's a reason to not compare it with the YOLOv4 family? If I recall correctly, the advantage of YOLOv5 over YOLOv4 is still disputed and YOLOv4-tiny seems widely used.

xbar3y ago

A distinct name should have been chosen.

toxik3y ago

Curiously, a host of newly created accounts are posting positive comments here. Is this astroturfing?

airbreather3y ago

seems like the models are able to pick a lot of things, but they don't what a strawberry is...

inbars3y ago

Very cool

inbars3y ago

Very cool!

j / k navigate · click thread line to collapse

61 comments

50 comments · 19 top-level

klohto3y ago· 8 in thread

> it's important to note that MT-YOLOv6 is not part of the official YOLO series

To me, it seems disrespectful, just add a different suffix.

garblegarble3y ago

>Now the "official" YOLO will release v6 and then what

I agree that it's disrespectful, although FYI the 'official' YOLO is done - the author, Joseph Redmon, has quit the field[1] because of the military and privacy concerns of CV.

1: https://twitter.com/pjreddie/status/1230524770350817280

black_puppydog3y ago

Not only did he quit CV, he also seems to have fun outside the field: https://twitter.com/pjreddie/status/1504180525656801280

Good on him! :)

1 more reply

amelius3y ago

> because of the military and privacy concerns of CV.

If all the morally good people quit a field, and we're left with only the morally bad people, is that a good thing?

3 more replies

npRandomOP3y ago

basedbertram3y ago

Are you kidding me?

1 more reply

echelon3y ago

At this point so many different people have reused "YOLOv{n}" that the lineage is broken. I don't know how or why it happened, but it continues to be a thing.

chriskanan3y ago

I assume this happened because Joseph Redmon, the creator of v1 - v3, left the field of computer vision [1]. So then other people just took (stole?) the name.

[1]. https://syncedreview.com/2020/02/24/yolo-creator-says-he-sto...

1 more reply

lamfm953y ago

Have you read the article or just tried to be the 1st comment? They did mention the reason, which seems sensible to me.

singularity20013y ago· 7 in thread

Q6T46nT668w6i3m3y ago

Yes, look into instance or panoptic segmentation. The most popular method is a region-based network that jointly regresses bounding box coordinates alongside an object mask and class label.

singularity20013y ago

Also occlusion inference would be fantastic, so that we can select between the visible parts of the object and the whole shape (behind trees etc).

Exciting decade.

yeldarb3y ago

Yes, this is called "instance segementation". There's a YOLO-based instance segmentation model called YOLACT.

singularity20013y ago

Wonderful, thanks.

It says backbone: Resnet101-FPN in https://github.com/dbolya/yolact ?

Anyone else looking for a pip installable solution: I found https://github.com/ayoolaolafenwa/PixelLib

And most current: https://github.com/yeliudev/catnet

dimatura3y ago

genewitch3y ago

Is this "rotoscoping"?

klysm3y ago

In real time?

formerly_proven3y ago· 4 in thread

Is this another fraud like YOLOv5?

rjdagost3y ago

saynay3y ago

npRandomOP3y ago

I compared YOLOv5 and v6 on several images, and v6 outperformed v5 by ~10% in the confidence level of the labels.

eis3y ago

turdnagel3y ago· 4 in thread

sdlion3y ago

KingOfCoders3y ago

I'm training YOLO on a 2080TI works fine but YMMV. Waiting for a 4080(TI).

genewitch3y ago

I've tested a 'machine vision for image tagging' self-hosted service and it seemed reasonably responsive, CPU only, too - but I ran a pre-trained model for that.

ekleraki3y ago

I would wait for the 4000 series from nVidia, which should be this fall, and then making a purchase. The best bang for your buck will likely be a 4070, or a discounted 3080.

franciscop3y ago· 3 in thread

joshvm3y ago

You can use yolov5 - here's a repo I made for an Ultralytics competition: https://github.com/jveitchmichaelis/edgetpu-yolo

codeinassembly3y ago

eis3y ago

I see you are working at "dagshub". Maybe you can let the people know that it's not a good show to create fake accounts here to push the story and leave useless praising comments.

bArray3y ago· 2 in thread

https://github.com/meituan/YOLOv6/blob/main/docs/About_namin...

> P.S. We are contacting the authors of YOLO series about the naming of YOLOv6.

You should ask _before_ publishing, not _after_.

They claim it runs faster and is more accurate than YOLOv5, yet requires 3x as much computation (GFLOPs)? Something doesn't add up here.

There is unbelievably little information about the architecture too. Unfortunately it's not in a format I can easily throw the cfg in as visualize it: https://gitlab.com/danbarry16/darknet-visual

This appears to be on purpose to advertise DagsHub: https://dagshub.com/pricing

saynay3y ago

The individual ops could be faster, so even though there are more of them, the overall speed is quicker. The authors mention using more 16bit ops, so that might be part of the reason?

ur-whale3y ago

> There is unbelievably little information about the architecture too

Fair enough, but the repo at git clone https://dagshub.com/nirbarazida/YOLOv6 seems to contain somewhat standard torch code.

So it's not exactly like the architecture is a secret.

The weights is a different story, of course.

amelius3y ago· 2 in thread

They cut the y-axis in the graph. The improvement is less dramatic than they want to make it seem.

rjdagost3y ago

Welcome to computer vision / machine learning research!

toxik3y ago

It is absolutely not common in CV or ML to use such underhanded tricks. Peer review is allergic to it.

2 more replies

notme12343y ago· 1 in thread

sdlion3y ago

martingoodson3y ago

He gave a great talk at the London Machine Learning Meetup in April, if you’re interested [2]. (Full disclosure: I run the meetup)

[1] https://mobile.twitter.com/pjreddie/status/12538910781821992...

[2] https://youtu.be/nxOzeTmqe3Y

hvdijk3y ago

> We can clearly see that YOLOv6s detects more objects in the image and has higher confidence about their label.

If actually look at the images they provide directly above:

rocauc3y ago

It appears the authors of YOLOv6 are aiming to employ a similar clever naming strategy.

I’m looking forward to more benchmarks before getting too excited.

[1] https://blog.roboflow.com/yolov4-versus-yolov5/

stathibus3y ago

isaacfrond3y ago

You can try version 5 on your iphone in realtime.

https://apps.apple.com/us/app/idetection/id1452689527

sdlion3y ago

There's a reason to not compare it with the YOLOv4 family? If I recall correctly, the advantage of YOLOv5 over YOLOv4 is still disputed and YOLOv4-tiny seems widely used.

xbar3y ago

A distinct name should have been chosen.

toxik3y ago

Curiously, a host of newly created accounts are posting positive comments here. Is this astroturfing?

airbreather3y ago

seems like the models are able to pick a lot of things, but they don't what a strawberry is...

inbars3y ago

Very cool

inbars3y ago

Very cool!

j / k navigate · click thread line to collapse