As has been said somewhat sadly at various conferences, "Scoring functions suck". They can be trained within (close?) families, they can show reasonable poses when enthalpies predominate but non-bonded/grease interactions were and probably still aren't a strength. It's a start until a crystal structure is generated and will help if/when the sar goes to hell and another is required.
>... Existing methods are computationally expensive as they rely on heavy candidate sampling coupled with scoring, ranking, and fine-tuning steps. We challenge this paradigm with EquiBind, an SE(3)-equivariant geometric deep learning model performing direct-shot prediction of both i) the receptor binding location (blind docking) and ii) the ligand's bound pose and orientation. ...
That's both because the technology itself doesn't really produce good leads, but because even if you find good leads, there are many other high probability reasons your drug will fail that have nothing to do with its safety and efficacy, and no ML model that exists today can address that.