- BSD queue.h offers linked lists, queues, etc [1].
- UT-hash offers hash tables [2].
I've also read some people being frustrated with them, probably due to the quirky syntax/API. On the plus side (BSD at least) it's just a header file that you can download, include and start using.
https://attractivechaos.wordpress.com/2008/08/28/comparison-...
(Also provides kbtree.h, ksort.h, kstream.h, kvec.h)
[3] https://attractivechaos.wordpress.com/programs/
Also there are the stb header-only libraries. There is a hash table implementation buried inside stb.h
The standard library does come with a half-baked hash table implementation[1] (hcreate/hdestroy/hsearch) that only allows creation of _one_ global hash table. GNU libc[2] adds hcreate_r/hdestroy_r/hsearch_r that allows multiple tables to be created.
The APIs are strange and antiquated. On the other hand, the hash table implementation described in the article presents a much nicer interface.
[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/hcr...
It looks like the same header, search.h, also includes functions for working with binary trees, where the user passes in the root pointer, and so which support multiple trees:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/ts...
So this committee must have signed off on the hashtable stuff even though they had an example of how to do it properly right next to it!
Signed,
POSIX_ME_HARDER
I think the whole thing would be best as a single page, to be honest. Even if it's long, people have scroll wheels. But it's a stylistic choice.
Also, your double hashing scheme has a bug. You identified that it's problematic if hash_b returns 0, but adding 1 to the result doesn't actually solve anything, because now you have the same problem if hash_b returns num_buckets-1.
Hashtables are one of those data structures one can easily get away with never having to write on their own, yet still use every day. In scripting languages it's impractical to roll your own, & in the more abstracted low level languages there'll usually be a pretty good generic version. Yet there's much room for variation
I had the joy to be out of reach of off-the-shelf hashtables for luwa,
init: https://github.com/serprex/luwa/blob/436c7271120fcd116af30e9...
get/set: https://github.com/serprex/luwa/blob/436c7271120fcd116af30e9...
hash: https://github.com/serprex/luwa/blob/436c7271120fcd116af30e9...
Still need to implement Lua's sequential-portion-as-array logic. I do like the API not having a 'remove' method, opting intead for nil by default. nil value entries get vacuumed out when rehashing. This ends up acting as tombstone. In an earlier iteration I was using 0 as a marker distinct from nil, but an implementation detail is that nil is stored at address 0.. oops
The hard part is allocation memory, keeping track of the structure(array) holding the hashtable, etc.
What else is there? Implementing the hash function/code?
If you don't have a CS degree, you probably don't even know how lists, queues, stacks, etc really work. But the kind of programming that is done at a "high" level, you might not even need know it.
Instead of saying people should implement it JS/Ruby/Python/etc ( all of whom are OOPs ), they should learn C and implement it in C.
Not having to deal with boilerplate and housekeeping lets you focus on learning the topic at hand. Especially in a classroom setting where you're probably focusing on the specific subject instead of real-world dirtiness.
But after learning the basic concept, you do need an actual implementation if you want to do anything.
Also the use of strdup, while a reasonable choice, should also come with caveats about ownership and copying
I stopped at the use of pow for exponentiation in the hash function. Just iteratively multiply your factor and take the modulus on each round.
This would handle utf-8 just fine if characters are cast to unsigned in the hash function and a larger prime factor is used
None of these are real criticisms, I'd just love to see a pointer to more in depth information on how to really write hash functions
I hope that i missed it, but this definitely needs a clear statement that this is a pedagogical exercise and not a production ready implementation