I don't understand the logic behind it. Since you're not part of the official series, why create the confusion? Now the "official" YOLO will release v6 and then what? Or do you expect them to skip it because you already made v6?
To me, it seems disrespectful, just add a different suffix.
I agree that it's disrespectful, although FYI the 'official' YOLO is done - the author, Joseph Redmon, has quit the field[1] because of the military and privacy concerns of CV.
Good on him! :)
If all the morally good people quit a field, and we're left with only the morally bad people, is that a good thing?
Are you kidding me?
[1]. https://syncedreview.com/2020/02/24/yolo-creator-says-he-sto...
Also occlusion inference would be fantastic, so that we can select between the visible parts of the object and the whole shape (behind trees etc).
Exciting decade.
It says backbone: Resnet101-FPN in https://github.com/dbolya/yolact ?
Anyone else looking for a pip installable solution: I found https://github.com/ayoolaolafenwa/PixelLib
And most current: https://github.com/yeliudev/catnet
I've tested a 'machine vision for image tagging' self-hosted service and it seemed reasonably responsive, CPU only, too - but I ran a pre-trained model for that.
Note: typical constraint is RAM and changes to the EdgeTPU compiler which now fails to convert larger models. Previously (version 15?) it would delegate layers to the CPU, but now it just doesn't work at all for large input sizes.
Also while it works, I think it's unlikely to be much better than a well trained mobilenet SSD. The advantage is you can train in pytorch and go from there, training quantised/edge models in Tensorflow is tricky.
> P.S. We are contacting the authors of YOLO series about the naming of YOLOv6.
You should ask _before_ publishing, not _after_.
They claim it runs faster and is more accurate than YOLOv5, yet requires 3x as much computation (GFLOPs)? Something doesn't add up here.
There is unbelievably little information about the architecture too. Unfortunately it's not in a format I can easily throw the cfg in as visualize it: https://gitlab.com/danbarry16/darknet-visual
This appears to be on purpose to advertise DagsHub: https://dagshub.com/pricing
Fair enough, but the repo at git clone https://dagshub.com/nirbarazida/YOLOv6 seems to contain somewhat standard torch code.
So it's not exactly like the architecture is a secret.
The weights is a different story, of course.
He gave a great talk at the London Machine Learning Meetup in April, if you’re interested [2]. (Full disclosure: I run the meetup)
[1] https://mobile.twitter.com/pjreddie/status/12538910781821992...
If actually look at the images they provide directly above:
In the first image, the older one detects one extra tie. In the second image, the objects detected are the same. In the third image, the older one detects a stop sign, and this new network (no, let's not call it YOLOv6) appears to get confused by the two cars behind one another and detects two objects, but the bounding boxes of the objects includes both, it doesn't look like it actually separates them. But to be fair to them, they do detect an additional person on the left.
It appears the authors of YOLOv6 are aiming to employ a similar clever naming strategy.
I’m looking forward to more benchmarks before getting too excited.