They mainly help in four ways:
- avoid data corruption when downloading/transferring/copying datasets
- notice changes/updates in the original dataset
- dataset versioning (think how e.g. git turns directories and files into hash trees -- also called content-addressing)
- most importantly: stable names without a naming authority