undefined | Better HN

0 pointsjw12242y ago0 comments

I lost countless hours debugging this, finally I came to the conclusion it must be unintentional because:

  1. It's undocumented. None of the regular rate limit responses are returned.
  2. You're charged for the full generation length. So if the output takes 10 minutes to generate, that's what you'll pay for (despite only getting half back).
  3. It defeats the point of the larger context limit models. Why offer a 32K model if it fails after ~6K tokens?
  4. The server response doesn't include any error codes or message, it simply terminates unexpectedly. Hit any of the actual rate limits, and you get told about it.

I'd expect to be able to generate output until the model reaches its context limit, or a stop sequence is detected, or I hit an actual documented rate limit.

We're paying for these requests in full. We should get the full response back!

0 comments

3 comments · 3 top-level

hnfong2y ago

Given that OpenAI is probably heavy on ML scientist types and lean on SRE types, I suspect there's just some nginx setting at the load balancer/edge servers that sets the timeout to 5 minutes, and it wasn't communicated to whoever was designing the API/writing the docs.

meandmycode2y ago

I wouldn't be surprised if this was an Azure related issue, given some of the similar madness I've experienced on Azure

SeanAnderson2y ago

You have fair concerns. Thanks for elaborating. Sorry you experienced all that trouble.

j / k navigate · click thread line to collapse