I keep coming back to something like collaborative filtering
Raw context-unaware scores may come from researchers, but 'real' scores are based on deployment context. So a vuln may have diff final scores based on context, and the community needs ways identifying which bucket they fall into. Likewise, there is probably an 80/20 for most libs in terms of use case, so a good bet on default bucket of typical personal profile might help further decrease overhyping for raw scores. CI vs Server vs Browser vs TCB vs ... .
Ex: npm audit reports on dev dependencies by default, but a regex complexity attack there is less legit, yet that's where most of the alerts seem to be. Instead of pushing to the user, can push a use-adjusted score to the reporting tool, and user can pick sensitivities for both.
Collaborative filtering is one way, and others as well. Without something like thay, scores are largely ungrounded FUD / box checking.