E.g. a doc for ffmpeg, which I checked by downloading docker image they provide to the model, is a README which basically just says this is ffmpeg and docs can be found online. They do not allow models to get online.
So a model is supposed to reverse-engineer a blackbox using only limited number of tries. I'm not sure even ASI can do this under these constraints (without memorizing the ffmpeg code base, obviously.)
In the only posts one of authors mentions "usage docs". Obviously they had a command-line tool like `grep` in mind -- where a man page sort-of specifies program behavior. But then added sqlite, ffmpeg, php, etc. - where a usage doc is like one millionth of information you need to implement ffmpeg.
And, of course, there's no human baseline. I'd guess making such a baseline would cost billions of dollars.