We want developers to iterate quickly on system designs: How should we break down the task? Where do we call LMs? What should they do?
---
If you can guess the right prompts right away for each LLM, tweak them well for any complex pipeline, and rarely have to change the pipeline (and hence all prompts in it), then you probably won't need this.
That said, it turns out that (a) prompts that work well are very specific to particular LMs, large & especially small ones, (b) prompts that work well change significantly when you tweak your pipeline or your data, and (c) prompts that work well may be long and time-consuming to find.
Oh, and often the prompt that works well changes for different inputs. Thinking in terms of strings is a glaring anti-pattern.