No bounded run time function can have an infinite output space (while still halting, anyway).
However by that logic the comparison function should also have a similar factor, since by the same argument the key values need to be large enough not to collide. Ie if the key values are just the natural numbers up to n for example, you'll need k_w words to store them and hence comparison takes O(k_w) = O(log(n)).
Thus a binary tree should also have an additional log(n) factor due to this, and hence be O(log(n)^2)).
However the point of big O is to compare, and since both have this log(n) factor you could just cancel that in both, leaving you with the familiar O(1) and O(log(n)) respectively for the hashmap and binary tree.
Is it sloppy to do that prematurely? Kinda yes, but it makes comparisons much easier as you don't have to go cancelling those log(n) factors all the time. And the point of big O is to compare the algorithms, not accurate running time.
1) You should have led with that to get to the point immediately instead of cryptically alluding to the existence of such a point and adding an irrelevant example.
2) Your point was already made in an initial reply[A]. You didn't need to make your comment at all, and if you were going to lazily allude to the existence of an argument, you could have just linked that one. Or, just upvote it. Comment sections don't need repeats of points already made unless you have something to add.
To address the point you did add:
3) You're still conceding the core point that you have to go outside the usual big-O model to get hashtables being O(1). In no other case does anyone take big-O to mean asymptotic complexity relative to comparable solutions for the same problem -- not unless that's explicitly stated. You can't have a data structure with constant-time lookup, period.
1. "But if the universe grows too big, then the integers get too big!". Fair point.
2. You concede that the Word-RAM model does accurately describe an O(1) hashing operation.
3. We now define a new computation model, where hashing is just O(1).
4. "this is still wrong, because this doesn't model the behavior as keys grow to infinity"
The important thing here is that the hashing oracle isn't defined to even care about integer arithmetic complexity. I think one thing you may be confused about is that it seems like the hashing oracle can only be implemented with integer-sized keys, actually the thing that epeople are trying to minimize in hashtable research is calls to that oracle. That oracle can be some genie in a lamp, my deadbeat cousin, or a computer program.
---
I think you even get that though on a logical level, so the crux probably goes something like, "The word-RAM model forces me to be of bounded size, but for higher level intuition, I just forget that? What? And then I stitch them back together? I mean sure, given that it's O(1), I get it, but there's no way this 'stitches together' cleanly, it's not actually O(1)!"
I didn't even know this term, but it seems like people have invented a term for this, the trans-dichotomous model. Basically, word size (integer size) scales with problem definition size. So I imagine that it's just taken for granted that trans-dichotomy sort of just "propagates up" your stack of logic, informally, if you want to understand each and every operation.
I mean, for me, the first paragraph was mostly fine enough for me - analyzing abstract models asymptotically w.r.t. various kinds of cost is already something I'm familiar with, so just saying hashing is an O(1) cost by definition makes sense.
The proofs doesn't have to know about what the hashing oracle actually is - the "integer" in the Word-RAM model is not the same as the "integer" produced by the hashing oracle, even though they both have "O(1)" cost within their respective models. I think viewing it as hooking them up together hurts you.
Word-RAM Integer is O(1) because it's trans-dichotomous, and in this specific hashtable analysis, the hashing oracle is O(1) just because it is by definition, we only care about # of calls to that function.
Even though the intuition is that they're hooking up together - recall that your hashing oracle backend can be your drunk cousin at 3am spouting out a consistent uniform random number mapping in the range [0...n-1], we're not really particularly interested in anything else.
Oh, and also, the oracle must take in two parameters, (key, size of space), and the assumption is that, for each size of space, the key is randomly uniformly consistently mapped. Although in the paper they don't even need to deal with variable hashtable resize schemes so they don't even need to re-define this.
But like again, like, we make these "constants" by saying, "I don't care how the hash function is performed, as long as it satisfies these properties; all I'm gonna try to do is implement an algorithm that does the minimal hashing possible". That's not nefarious, that's standard abstraction practice.
I did try to sit with your doubt for a while though and this is the best explanation I can come up with. If I didn't understand, oh well.
https://en.wikipedia.org/wiki/Word_RAM https://en.wikipedia.org/wiki/Transdichotomous_model