A model randomly remixing human works to reach an approximated result that matches a few words inserted by a human cannot be put on the same plane as a human expressing a concept or inner feelings using knowledge he acquired by looking at other works.
When I look at art (be it paintings, stories, animation, or any other medium) I always think about the people behind it, what stories are they trying to tell, what life lessons and ideals are they trying to bring across the screen.
A text prompt absolutely cannot convey the same amount of meaning as a work of art that takes actual time to make and refine, and however amazing the output may look, it is inherently meaningless, with no human intent behind it to make something great and original.