Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
cyanf
1y ago
0 comments
Save
Share
They’re using the FS for caching the KV caches of past requests. It’s why they’re able to charge so little on prompt cache hit.
0 comments
1 comments · 1 top-level
top
newest
oldest
jpgvm
1y ago
Ahh I missed that. Yes prefix caching and RAG are 2 cases were you will want something like this during inference time.
j
/
k
navigate · click thread line to collapse