"Roughly, RAG is runtime prompt engineering where you build a system to dynamically add relevant things to your prompt before you ask the agent for an answer."
For sure, it's only worth doing if you actually have so much relevant data that it doesn't fit in the context! This is definitely the case for us for this problem, but it's not universal.
To me the RAG hype is just the sudden rediscovery of information retrieval by money hounds that did not care about AI/ML for decades and now are in panic mode due to FOMO.
For me it's nice to see the re-use of existing infra - big query. Personally, I was rooting for Postgres, but your logic of why not makes sense! Great post.
Reading this was very valuable. I really appreciate the Vespa mention and introduction to their Multi-Vector HNSW Indexing - I’ve recently thought a lot about how difficult chunking is and this seems like a promising avenue.