The dedupe engine written where I work chunks a file and hashes those chunks meaning it's somewhat harder to craft collisions (I forget where the chunking boundaries are, but it's within a range iirc). The hashing algorithm was SHA-1 last I checkee but I've never heard even company folklore of corrupted backups caused by hash collisions. I get the feeling that it's near impossible in practical terms given the size of the string being hashed. Having said that, hubris is the downfall of programmers everywhere, so I wouldn't bet all my money on it.