If you had a cupboard in your basement storing a can of spray that had a "warning: flammable" sticker with a weak glue: the sticker could fall off, you could have used it without knowing it's flammable, which could lead you to setting your house on fire. Therefore the CVSS score of "weak sticker glue" is exactly the same as "your house is on fire".
This maximally inflated score is then combined with a primitive reporting policy that alerts about mere presence of dependencies with an advisory anywhere in your dependency tree. Security vendors aren't checking whether the dependencies are actually affected in this context or not. This is a lazy recipe for labelling everything as critically vulnerable all the time.
One thing I didn't analyse that might be interesting would be to see if there's a consistent gap between security vendor / researcher reported scores in NVD and scores given to customers by their general software vendors.
Do you work in this field or did you research all this as a hobby, or both?
I'd had in mind to write about CVSS for some time. I wrote a few paras and created the first plot, but that's as far as I got for about a year. One Saturday morning about 2 months ago I started thinking about it again and got the itch to finish it. It kept growing as I went because I kept finding more to write about.
Edit: the original ambition for my site was more about applying what I have learned about statistics, psychology and economics to general software development and ops. But now that my dayjob is mostly thinking about security it has veered off in that direction instead (for the moment at least).
At the moment IMHO the major issue comes from that people use only the Basic Score of the CVSS 3.1, issued by the NVD.
Indeed, if you also take the Temporal Score (with CTI feeds for example), and if you add the Environmental Score, then you can have very good results to help prioritizing the vulnerabilities on your assets and reflect the real threat.
I would also like, however, to see the CVSS4 with a "cost to patch" component: in OT environments, CISO like to use the SSVC because it’s the easiest way to say "wait" instead of "patch now". But since SSVC is not really recognized by all auditors, it generates conflicts. Bringing a component in the CVSS to reflect the cost of remediation on very complex devices, where deploying a KB requires to stop a full factory, could help getting the same results (aka "don’t patch now and wait") but with a more respected scoring system.
From my perspective, that’s the only missing component for a good CVSS system :).
The issue with this is that the people who are best suited to score an issue from the reporting perspective won't necessarily have any idea what the cost to patch something actually is. This is why CVSS shouldn't be used as a be-all-and-end-all metric for anything -- there are a lot of factors that don't relate to the vulnerability's relative severity that it does not account for.
The other criticisms section starts with the "You're doing it wrong" commentary and then moves on to discuss two other groups saying what boils down to "You're doing it wrong and the metric is bad because it encourages you to do it wrong", which as a way of demonstrating diversity of opinion is entertaining at least.
CVSSv3.1 as a metric is not designed to have a uniform distribution of possible values from 0.1 -> 10.0 and it should not generally be a goal to develop a scoring system that does. It is designed solely to answer the questions of "which issue is more severe" when comparing different issues and to then help direct and prioritize fix work. It is not perfect at this but it is superior to other systems out there, especially when taking the pure severity of a given vulnerability in isolation.
I do get that people really do try to sell the idea that it's an infallible metric and that it means something substantially more than it does. It also gets confused often as "X is riskier because its score is higher", which is obviously wrong. If you have an authentication-related product, it's obviously more damaging to discover certain categories of information leakage than it may be to find cross-site scripting issues in general.
I think it is correct for a change in scope to have a much more outsized impact on the final score, something the author seems to sort of presume is wrong (referring to it as the "villain" at one point) without really explaining why they believe it is wrong. A scope change essentially means lateral movement to other systems rather than the compromise of a single piece of software.
Could a better metric be designed? Sure. I'd like to see some additional degrees of user interaction being accounted for, as just one example. The concept of vectors being Network, Adjacent, Local, or Physical could use some more fleshing out for the modern age, for another.
Does that mean alternative approaches are better? Not in my experience. All the alternatives I've experienced basically boil down to "we made our own system, don't publish the calculations, and lots more stuff is critical impact and risk" whenever you get reports. I've literally had third-party pentest teams try to sell me that an Info Exposure that was showing server IPs in a log was a High, because they used their own metric.
I'd argue that for what it is intended to do, CVSSv3.1 does a good enough job and that's why so many people have accepted it as a standard.