- Claude 3 (Opus) https://gist.github.com/adaboese/d0b7397381726a7d394920e6a82ee39c
Both of these are outputs of AIMD app. They are not made using a single prompt, but rather using RAG with over a dozen instructions. This allows to test a quite broad range of expectations, such as the adherence to instructions, error rate, speed, etc. Since the two model APIs are mostly compatible, I've decided to compare it side-by-side.
A few interesting observations:
- Claude followed instructions a lot closer than OpenAI. The outline that was provided to the initial instructions is pretty close to the final article structure despite multiple revisions.
- Claude output scored better in terms of use of broader set of data formats (tables, lists, quotes).
- Contrary to many tweets, Claude output is not excessively verbose. Worth mentioning that part of RAG instructions to rewrite content for brevity.
- Claude took 5 minutes to execute 52 prompts. OpenAI took 7 minutes.
My app users are receiving error "Routing deadline expired"
I am also seeing this error logged in Sentry, but there is no useful information associated with it (e.g. no stack trace, which is unusual for Sentry – all other errors have it).
There is no mention of this anywhere in my codebase, and I cannot find mentions of this error in node_modules, i.e. I don't understand where it originates from. I even tried downloading every script in the bundle and looking for similar errors – nothing.
Even more bizarrely, I cannot find any references of this error happening to others when searching Google. I also tried searching GitHub, but there are no references to this error.
The only useful piece of information I have is that the error is caught by React error boundary. So we know it originates from some script within that context.
My Stack is Remix + Vite + React.
I also use Sentry and PostHog for monitoring.
I've sent support requests to different vendors, but figured I have little to lose by posting here too.
https://gist.github.com/adaboese/bcde05aa8294924cc1f85718e197c024
TLDR
* fails if password contains the email
* fails if password is too short (<8)
* fails if password starts with a whitespace
* fails if password ends with a whitespace
* fails if password is not diverse enough (e.g. foofoofoo)
* fails if password only contains numbers
* fails if password contains common sequences (e.g. 123 or asdf) [this is really only so I could remove a bunch of passwords from the next dictionary-based step]
* fails if password contains common passwords (based on 10k most popular passwords)
I have one conversation that was initiated by a complimentary SaaS and they are asking for me to provide the terms. I am considering offering 20% of the first year revenue generated by the client, paid out after every month that completes.
What's a good benchmark to evaluate whether this is a good offer?