Quantity is massively more important, to the point that you get much better results by using a larger dataset with machine generated labelling (or object detection if that's what you're building) than using a smaller one with even expert labelling.
That said, if you've got a million photos you could probably do some pretty interesting things with very large scale fine tuning, or if you know many other people who have similar stockpiles of photos you may be able to get an entry-level dataset together if you all pool it.