undefined | Better HN

0 pointstmcneal2y ago0 comments

Pricing is listed on https://zerostep.com - you get 1,000 ai() calls per month for free, and then the cheapest paid plan is 2,000 ai() calls per month for $20, 4,000 for $40, etc. So basically you pay a penny per ai() call.

In terms of reliability - we have a hard dependency on the OpenAI API, so that's what will affect reliability the most. We're using GPT-3.5 and GPT-4 models, which have been fairly reliable, but we'll bump to GPT-4-Turbo eventually. Right now GPT-4-Turbo is listed as "not suited for production use" in OpenAI's docs: https://platform.openai.com/docs/models

0 pointstmcneal2y ago0 comments

0 comments

4 comments · 1 top-level

koreth12y ago· 3 in thread

That's one aspect of reliability, but the one I was more curious about was determinism. If I repeatedly run the same test suite on the same code base and the same data and configuration, am I guaranteed to get the same test results every time, or is it possible for ai() to change its mind about what actions to take?

tmcnealOP2y ago

Ah got it. So GPT is non-deterministic, but we somewhat handle that by having a caching layer in our AI. Basically if you make an ai() call, and we see that the page state is identical to a previous invocation of that exact AI prompt, then we will not consult the AI and install return you the cached result. We did this mainly to reduce costs and speed up execution of the 2nd-to-nth run of the same test, but it does make the AI a bit more deterministic.

There are some new features in GPT-4-Turbo that will let us handle determinism better, and we will be exploring that once GPT-4-Turbo is stable.

jaggederest2y ago

That makes a lot of sense, thank you for the explanation, I will have to explore this the next time I am building page tests. Have considered doing it myself but much happier using a relatively inexpensive product than maintaining the creaky homebuild version.

jaggederest2y ago

Thank you for the clarifying comment, this was really the thing I was meaning when I imprecisely said "reliability".

j / k navigate · click thread line to collapse