Correct me if I'm wrong, you can pick which dated GPT API to utilize and expect that to not act as a continually changing black box. I've been using the API for a long time and have been able to pick the version.
So for example: gpt-4-0314, or gpt-3.5-turbo-0613, etc.
The latency issue is definitely true. Ideally the cost could be limited to a very small percentage of hard cases (which you first have to identify).