FWIW, this is a project I'm REALLY interested in. If you go open source, I'd totally get involved.
I have a mirror of *booru image galleries (basically, anime/manga images, but REALLY well tagged: upwards of 20+ tags per image) that I've been occationally meaning to see if I can turn into a google-deep-dream-like nightmare to make the worst hentai possible (it started as the idea for a terrible idea hackathon[1]).
I have a fair bit of experience writing web-scrapers [2], but no idea what I'm doing with neural net stuff. I keep meaning to spend time to figure out how to do anything, but other projects (and work) keep me distracted.
Anyways, if you have any use for 5496863 images (probably mostly hentai) with 196852748 tags, hit me up.
Or you could just run the scraper [3] yourself, but I hope you have ~5+ TB of free disk space.
1: http://www.stupidhackathon.com/
2: https://github.com/fake-name
3: https://github.com/fake-name/DanbooruScraper