k9294 on Hacker News

Ask HN: AI infrastructure in production – what is your tech stack?

I'm working at an AI startup, and our infrastructure has grown increasingly complex. What started as a few simple model calls has evolved into systems that limit us and slow us down.

Our current setup requires touching multiple different components whenever we add a new feature. We spend more time maintaining infrastructure than building actual products.

The real pain comes when stakeholders get excited about new capabilities. Last week, they saw Google Gemini's new native image generation and editing features and wanted them integrated ASAP. This requires:

- Adding new API integration code to handle LLMs returning images

- Updating a cost calculator to support this case

- Updating our storage system to automatically handle images in LLM responses and store them in S3

- Probably something will blow up that I haven't thought about yet

I'm spending about 70% of development time (if not more) on infrastructure related tasks rather than product features, and this ratio is getting worse as we add more capabilities.

- How is your company handling this complexity? Any recommendations?

- Any open-source tools/frameworks that have significantly reduced your AI infrastructure complexity that more teams should know about?

- Which tools or approaches did you try that became maintenance nightmares and you'd warn others to avoid?

- Any AI-related third-party services you would recommend?

p.s. I will share our stack in the comments. Hit HN's max character limit.

5k92941y ago1

Ask HN: AI infrastructure in production – what is your tech stack?

I'm working at an AI startup, and our infrastructure has grown increasingly complex. What started as a few simple model calls has evolved into systems that limit us and slow us down.

Our current setup requires touching multiple different components whenever we add a new feature. We spend more time maintaining infrastructure than building actual products.

- Adding new API integration code to handle LLMs returning images

- Updating a cost calculator to support this case

- Updating our storage system to automatically handle images in LLM responses and store them in S3

- Probably something will blow up that I haven't thought about yet

I'm spending about 70% of development time (if not more) on infrastructure related tasks rather than product features, and this ratio is getting worse as we add more capabilities.

- How is your company handling this complexity? Any recommendations?

- Any open-source tools/frameworks that have significantly reduced your AI infrastructure complexity that more teams should know about?

- Which tools or approaches did you try that became maintenance nightmares and you'd warn others to avoid?

- Any AI-related third-party services you would recommend?

p.s. I will share our stack in the comments. Hit HN's max character limit.

5k92941y ago1

k9294

Recent submissions

Gemini 3.1 Flash Lite Preview (opens in new tab)

Ask HN: AI infrastructure in production – what is your tech stack?

Recent submissions

Gemini 3.1 Flash Lite Preview (opens in new tab)

Ask HN: AI infrastructure in production – what is your tech stack?