Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
edg5000
3mo ago
0 comments
Save
Share
So limiting max context length also reduces VRAM needs a bit? If cache is 20% of total, 1/10th of context as a limit would mean 18% total memory reduction.
0 comments
1 comments · 1 top-level
top
newest
oldest
valine
3mo ago
Yup exactly, in principle it helps with both inference speed by reducing memory bandwidth usage and also reduces the memory footprint of your kvcache.
j
/
k
navigate · click thread line to collapse