It does not include the vast majority of breaches that happen every year and are reported to federal and state regulatory bodies or as posted to cybercrime / ransomware sites.
One of the coolest things is that this process though flawed is transparent and semi-open to the public.
The dataset and the underlying process for which events are selected takes place in the open on GitHub.
Kudos to their commitment to open source.