Edit: Similar with the "UI components" section, the long one is missing the UI while the short one is UI without the trigger to activate it. You'd probably combine the two, using state from the first to control the UI in the second (replacing the contents of the useEffect with the dialog API to get the modal effect).
You need to know the shape of the solution ...
And it feels like claude code has gotten more verbose with the multiline comments lately
One thing I found: asking the model to respond in structured JSON (with a strict schema) vs free-form text cuts token output by ~40% on average. The model stops "explaining itself" and just gives you the answer.
Also noticed that including a reference image in vision calls roughly doubles the input cost but improves accuracy enough that you save on retries. Net cost ended up lower for my use case.
Curious if you've measured the difference between asking for "concise" output vs actually constraining the response format.