undefined | Better HN

0 pointsweird-eye-issue6mo ago0 comments

They absolutely are segregated

With OpenAI at least you can specify the cache key and they even have this in the docs:

Use the prompt_cache_key parameter consistently across requests that share common prefixes. Select a granularity that keeps each unique prefix-prompt_cache_key combination below 15 requests per minute to avoid cache overflow.

0 comments

4 comments · 2 top-level

ambicapter6mo ago· 1 in thread

> Select a granularity that keeps each unique prefix-prompt_cache_key combination below 15 requests per minute to avoid cache overflow.

Why below a certain number? Usually in caches a high number of requests keeps the cached bit from expiring or being replaced, no?

weird-eye-issueOP6mo ago

It needs to go to the same machine and machines can only handle so many requests

psadri6mo ago· 1 in thread

Does anyone actually compute / use this key feature? Or do you rely on implicit caching? I wish HN had a comment with a poll feature.

weird-eye-issueOP6mo ago

It would be important to use for relatively high traffic use cases

Let's say you have a chatbot with hundreds of active users, their requests could get routed to different machines which would mean the implicit caching wouldn't work

If you set the cache key to a user id then it would be more likely each user's chat could get cached on subsequent requests

j / k navigate · click thread line to collapse