undefined | Better HN

0 pointsSilasX10y ago0 comments

But those situations aren't the same -- comparison sorts take O(n log n) because it is expressed in terms of n, the number of elements, which is orthogonal to element size. Even if you did account for the max element value V, it would be constant with respect to that value. The full bound would then be n log(V) log n -- regardless of your choice of V, you don't affect the scaling with respect to n.

But for hashtables, what exactly are we studying that doesn't care about hash time? The whole reason that a hashtable is supposed to save time is that you locate the desired key's value by computing the location from the key itself. To whatever extent that computation requires more steps, that cannot itself be abstract away -- if only because a limitation on key size is a limitation on table size.

0 comments

Cushman10y ago

That example may have misfired-- perhaps consider that sorting an array of n elements requires addressing n elements, which requires arithmetic on words of log n bits, which are therefore log time operations. Limiting the size of each element won't help your complexity.

But we really don't care about that very much. In this case.

It's kind of described in that SO post I linked-- you can assume a constant-time hashing operation, but if that offends your conscience, assume an "integer RAM" model where arithmetic operations up to a certain word size w are constant time. Then you can observe that any reasonable w is inconsequential to your problem domain, and go back to treating hashing as a constant-time operation :)

The idea is that the fact that w increases at least as the log of n is shared by all computations modeled in integer RAM, so it's not a useful contribution to analysis at that scale. It's in the model, it's just not generally useful to the model. If you ever find yourself asking, is the bit width relevant? You can do some math and get a straight answer.

Of course real world algorithms often hash strings which complicates the analysis, but that's orthogonal, like my misstep earlier in implying the size of an element rather than the size of n. The mathematical sense in which hashing time must increase with the log of n is a fact of all algorithms that have ns.

I think your surprise at this just stems from wanting a rigorous definition for something that's more of a modeling tool, with different models to choose from. In real life, there's the time on the wall since you clicked "Run", and everything else is theorygramming.

SilasXOP10y ago

You can't have it both ways: on the one hand, say that big-O is about arbitrarily large unrealizable values, but on the other hand make a bunch of exceptions for where things don't matter in practice, lest you be theory-gramming.

If you go the route that requires you to handle arbitrarily large tables, then computing a hash with a long enough value necessarily requires more steps, even if you assume all operations on length w integers are constant time -- because eventually your keys have to be larger than w, after which computing the hash increases in number of steps with the key size.

This is still different from the sorting problem. You can have a consistent computation model in which memory seek times are constant (even if unrealizable). That seems fundamentally different from a model where a hash computation is constant steps but also has arbitrarily long output.

The issue isn't whether the scaling of the hash computation matters in practice, but whether the big-O is properly O(1). And even if you did just care about "in practice" limitations, you're stuck with the problem of the hash computation taking longer than any realizable trie lookup would (as another poster mentioned is common), meaning that the O(1) claim misrepresents performance relative to another data structure.

Cushman10y ago

Edit: Ugh, the old version of this post turned out really lectury, and I'm still working on my own intuitive understanding here, so I'm just gonna start over.

Basically-- you're not wrong. It's a quirk of analysis that a hash table is allowed to take the modulo (for example) of an arbitrarily large number, producing an arbitrarily large number, in "constant time", while a trie is not allowed to descend to an arbitrary depth in constant time.

But saying that such a thing is possible is a simplifying assumption that we make, in general, for non-numeric algorithms: arithmetic is O(1).

Of course, real programs cannot do this for arbitrarily large numbers. So if you are actually concerned with arbitrarily large numbers, you must do something else. Thus, other sorts of analysis which would give you a different number for hash table complexity. Yay!

And if you're having difficulty using the reported big-O value under the canonical model to compare algorithms which have very similar amortized performance in the real world, your problem is just that big-O is not for that thing you're doing.

SilasXOP10y ago

Thanks for taking the time to revise and give a thorough and fair reply.

To give background, my main concern is with the exceptionalism: the defining (good) thing about STEM is that there's actual logic to the discipline, where if I forget an answer, I can re-derive it. It's not just a game of memorization and regurgitation.

But O(1) for hash lookup feels like the old soft-sciency "hey, you just have to memorize this". I don't have one consistent model that lets me derive bounds, and which produces O(1) for hash lookup and O(log n) for trie lookup. The other convention of constant seek times is unrealistic [2], but it's at least consistent. This isn't.

Rather, it's supposed to be a big-O table. If these were presented as "in-practice" bounds, with its own consistent logic about which resources are scarce relative to which, that would be a different story. Then I would have a consistent set of conventions that produce O(1); it wouldn't come off as "oh, we assume the rules don't apply here".

>But saying that such a thing is possible is a simplifying assumption that we make, in general, for non-numeric algorithms: arithmetic is O(1).

Where? Every treatment I've seen takes arithmetic as linear in the number of bits. [1] This is still consistent with O(n log n) for sorts because you can assume a bound on element size, and still only have constant factor overhead.

>Of course, real programs cannot do this for arbitrarily large numbers. So if you are actually concerned with arbitrarily large numbers, you must do something else. Thus, other sorts of analysis which would give you a different number for hash table complexity. Yay!

But (as above), the problem is it's not even clear what sort of problem class we were doing that let's you assume this particular thing can be bounded but others can't.

[1] https://en.wikipedia.org/wiki/Computational_complexity_of_ma...

[2] Pretty sure that in the physical universe, memory seek requires n^(1/3) because you have to cram it into a volume whose size increases with the cube of longest distance between addresses, and travel time is linear in distance.

1 more reply

j / k navigate · click thread line to collapse

0 comments

Cushman10y ago

But we really don't care about that very much. In this case.

SilasXOP10y ago

Cushman10y ago

Edit: Ugh, the old version of this post turned out really lectury, and I'm still working on my own intuitive understanding here, so I'm just gonna start over.

But saying that such a thing is possible is a simplifying assumption that we make, in general, for non-numeric algorithms: arithmetic is O(1).

SilasXOP10y ago

Thanks for taking the time to revise and give a thorough and fair reply.

>But saying that such a thing is possible is a simplifying assumption that we make, in general, for non-numeric algorithms: arithmetic is O(1).

But (as above), the problem is it's not even clear what sort of problem class we were doing that let's you assume this particular thing can be bounded but others can't.

[1] https://en.wikipedia.org/wiki/Computational_complexity_of_ma...

1 more reply

j / k navigate · click thread line to collapse