That’s just not an interesting or rewarding way to interact with a computer, and the last thing I want to do is add long wait times and nickel-and-dime cost to the process. Layer on using different LLMs for different tasks or trying them out against each other and cross-checking output and it’s a mind-numbingly indirect way to get anything accomplished that in the end teaches me nothing and develops no useful skill that I enjoy practicing.
If it works for you, great, but even the most honest and genuine fans make it sound like a nightmare to me.
It doesn't seem likely an LLM will ever do that. Maybe at a certain point of sophistication? But if the model is regularly changing - which they almost all will be, if they're expected to be up-to-date - there is a strong change they'll be different every time they're used.
(I've been getting different behaviour in even relatively narrow ML-based systems for years. Google Assistant is my prime example - I regularly use the phrase "add to my calendar on the 20th of September at 5pm, go to the park". Almost all the time, it works perfectly. But a couple times a year at least, it won't process this as an action - it just does a Google web search for this string.)
So yeah, prompt "engineering" is indeed a silly term, but software "engineering" kicked off the dilution of that word ages ago. And GPT models can be inspected and measured for input and output, prompts can be analyzed for their effects and usefulness, temperature settings even directly control some degree of determinism. It's not like models change on a whim unless you're just using end user products. Anthropic, Huggingface, AWS, OpenAI, they let you pick a release model version in your API calls and stick with it for a long time. If you're self hosting a fine tuned Llama 70b, nobody will ever force you to update it if you get it doing a task to your expectations. The quality of deterministic behavior in AI is currently lower than that of Excel or C code, but it's also serving a wholly different purpose, people want it to be creative and create novel nondeterministic outputs, comparing them is a bit silly.
I think of it as similar to Googling in the early days. What started as a skill I had to pick up became second nature and I could find things faster than my family without even really thinking about what I was doing. It just became natural.
Most of my colleagues communicate with chatgpt in broken english, or they ask a question while leaving out crucial details about their problem. They’re always surprised when i am able to get a useful response from chatgpt when they couldn’t. it’s comical sometimes.
I 100% hear you on the “not a fun way to interact” though. To each their own. I personally enjoy it, it’s like a rubber duck that can actually talk back. :) not for everyone though.
The problem is that GenAI is a complete black box with nondeterministic outputs. I can write code and I know with a very high degree of confidence what I expect it to do. Asking an LLM or a generative image program for something, I have no idea what it'll give me. It gives no feedback other than results, which may or may not be what I want. If not, I have to reverse engineer what I think it might want me to say in order to get desired results. And the same query placed another time might give a completely different answer. I don't deny that it can do some impressive things given the correct inputs, but I am not inclined to spend my time searching for the magic words.
You're showing a fundamental misunderstanding (or ignorance) of the whole problem domain.
For starters, you place an awful lot of emphasis on what you think is "carefully craft English language prompts". That makes as much sense as characterizing the job of a database engineer as "carefully crafting quasi-English language prompts". The language used is completely irrelevant, and being able to use in some circumstances something resembling natural language to build up context does not take away from it.
Any remotely honest and objective analysis of the topic would start from similar activities, and to start off the areas of work where Llama are being used. For image/video generation you need to look at graphics design, video editing, video production, illustrators, etc. These activities, by their own nature, are iterative and exploratory. Then for text you have the work of copywriters and editors, and even writers and essayisgs. The work is fundamentally iterative and exploratory. Then you have work like exploratory data analysis/statistics/data mining. Every aspect of that work is iterative, even the reporting part.