Ask HN: Devs using LLMs, how are you keeping costs low for LLM calls locally?

5 pointsspruce_tips1y ago8 comments

My project has a multi step LLM flow using gpt-4o.

While developing new features/testing locally, the LLM flow frequently runs, and I use a bunch of tokens. My openAI bill spikes.

I've made some efforts to stub LLM responses but it adds a decent bit of complexity and work. I don't want to run a model locally with ollama because I need to output to be high quality and fast.

Curious how others are handling similar situations.