Oh like hell it requires no skill.
You tell me how you'll generate better photos, improve dialogue coherence across multiple speakers, and control camera direction and movement (something we're using LLMs for too as we experiment with special-purposed models).
All of this is not known a priori, by the way. And I won't accept building a database or lookup table as an answer.
I also want to know how you'll test, benchmark, and refine.
You also need to budget for inference complexity.
I'm waiting :)
I can do this myself, but it is a full time job. I am so busy with all other aspects of my business I'm looking for people to bring on board.