Our current setup requires touching multiple different components whenever we add a new feature. We spend more time maintaining infrastructure than building actual products.
The real pain comes when stakeholders get excited about new capabilities. Last week, they saw Google Gemini's new native image generation and editing features and wanted them integrated ASAP. This requires:
- Adding new API integration code to handle LLMs returning images
- Updating a cost calculator to support this case
- Updating our storage system to automatically handle images in LLM responses and store them in S3
- Probably something will blow up that I haven't thought about yet
I'm spending about 70% of development time (if not more) on infrastructure related tasks rather than product features, and this ratio is getting worse as we add more capabilities.
- How is your company handling this complexity? Any recommendations?
- Any open-source tools/frameworks that have significantly reduced your AI infrastructure complexity that more teams should know about?
- Which tools or approaches did you try that became maintenance nightmares and you'd warn others to avoid?
- Any AI-related third-party services you would recommend?
p.s. I will share our stack in the comments. Hit HN's max character limit.