I haven't thought about the filesize/hash to reduce collision, but I chose to stick with the last-modified time in the article, because it can takes hours computing hashes for a big directory tree.
Tools like rsync relies on last-modified time by default, and since I want to use this to track my own files, I won't fake it, so I think it's not a big deal?