It's high time we start seeing software understand that data identity as totally severable from data location -- and if our existing tools can't do it, we're gonna start seeing more and more clever hacks like this to make it happen.
(This (or bifrost, the upstream) should probably not be using md5 in this day and age though! sha384 and blake2 are both much, much better choices, and immune to length extension issues. sha512 is also fine, as long as the content length is also tracked.)
I believe it was Alan Kay who said, "names don't scale". We've known about this problem for a half century. wget should be a semi-intelligent software agent, so that we humans can free up mental resources for the truly difficult problems.
apparently in (http://joearms.github.io/2015/03/12/The_web_of_names.html) joe-armstrong also says something similar.
Length extension allows you to calculate the final hash of appended data without knowing the original data. H(secret + data) is vulnerable because you can calculate H(secret + data + extra) without knowing secret. The solution here is to use an HMAC, and in this case MD5 is still sufficient. It doesn't apply to the post's context at all.
SHA256 is sufficient to verify a file hasn't changed. The next step up would be to sign the file with a PGP key, which allows you to verify the source as well.
I wonder if it could look for `${currentAddress%$filename}md5sums.txt` (or similar) as that's often where an md5sum for a file is, and then comparing that, rather than downloading the whole file and hoping for the best?
eg.
User wants file.tar.gz:aaaaaa...
wget-finder finds http://downloadable.com/file/file.tar.gz
wget-finder checks for (and downloads if present) http://downloadable.com/file/md5sums.txt
wget-finder compares the md5sum in md5sums.txt to the aaaaaa...
If it's good, it downloads the file (and still does the final check) and if it isn't it keeps searching, having not downloaded the file unnecessarily.
Seems like it could be neat for large files (to avoid downloading the wrong file as often).
(Could also check for md5sum.txt or md5sums or md5.txt etc)
Interesting hack, nonetheless.
You'd never have to worry about losing links to things if all of your links were magnet links and you hosted files with bittorrent not http.
A magnet link can be just a sha hash. You could write a browser plugin to rewrite all sha hashes into magnet links.
The real hurdle with that is releasing a bittorrent client that separated itself from the grey area of media piracy.
If firefox included native libraries for downloading magnet links, it would be invisible to users.
You could also write btwget using libtorrent (or patch wget to handle magnet links)