Direct string comparison of the current URL to previously submitted ones doesn't work because there are many ways for two identical web pages to have different URLs. For example, the URL fragments can differ (the part after the "#" that may or may not be present). Also there can be tracking parameters (often—but not necessarily—prefixed with "utm_"), which don't change anything about the page. But the URL parameters can't be entirely disregarded because sometimes sites, forums in particular, rely on them – consider pages that use an "?id=..." parameter for different pages. Thus some parameters should be removed, but some shouldn't. The same website having different domains (or domains that change over time) further complicates the situation.
My solution was to "canonicalize" URLs by transforming them into a simplified form using some pretty rough heuristics for common sources of noise. The Python code to do that is here: https://github.com/jstrieb/hackernews-button/blob/master/can...
All of this to say that even though I've used my extension for months and have been quite happy, there will inevitably be false negatives.