$ ls
01.jpg 03.jpg 03_copy.jpg 04.jpg 05.jpg
$ git init
Initialized empty Git repository in /tmp/test/.git/
$ git hash-object -w *
82f7d50fc89d2fd47150aff539ea4acf45ec1589
0080672bc4f248c400d569cce1a2a3d743eb1331
0080672bc4f248c400d569cce1a2a3d743eb1331
58db57b10c219b9b71f0223e58a6dc0d51cfe207
05dcde743807bddaf55ad1231572c1365d4db4af
$ find .git/objects -type f
.git/objects/00/80672bc4f248c400d569cce1a2a3d743eb1331
.git/objects/05/dcde743807bddaf55ad1231572c1365d4db4af
.git/objects/58/db57b10c219b9b71f0223e58a6dc0d51cfe207
.git/objects/82/f7d50fc89d2fd47150aff539ea4acf45ec1589
If you're curious, you can read more about how it works here: https://git-scm.com/book/en/v1/Git-Internals-Git-ObjectsVery cool!
[1] https://github.com/adrianlopezroche/fdupes
Edit: just noticed that it's using md5, which is broken [2], and that it's using truncated md5 hashes.....!
[2] https://natmchugh.blogspot.ca/2015/02/create-your-own-md5-co...
md5 is fine for deduplicating. It's extremely improbable you'd 'organically' get a md5 hash clash for two different files.
Even such a simple optimization can make a huge difference on a large directory of images or MP3s.