I agree that you need training data to build AI from scratch, much like you need lots of really smart developers and a mailing list and servers and stuff to build the Linux kernel from scratch. But it's not like having the training data and training code will get you the same result, in the way something like open data in science is about replicating results.