A year+ later, the most interesting kernel of insight to us from dspy is autotuning a single prompt: it's an optimizeable model just like any other. As soon as you have an eval framework in place for your prompts, having something like dspy tune your prompts on a per-LLM basis would be very cool. I'm not sure where they are on that, it seems against the grain for their focus. We're only now reaching the point where we would see ROI on that kind of thing, it took a long time to get here.
We do run an agentic framework, so doing cross-prompt autotuning would be neat too -- especially for how the orchestrator (ex: CoT) composes with individual agents. We call this the "composition problem" and it's frustrating. However, again, dspy and friends do "too much", by trying to also be the agent framework & runtime, while we just want the autotuner.