Unless I really didn't want to introduce dependencies, or reduce code size, I think I'd use an off the shelf hash table implementation these days. It's still a fun exercise building your own though.
Also, in terms of this table, if you add the computed key hash value to the stored hash entry, you can check that the value of the hashes match before you do the full strcmp. If you have a weak hash you might also get a benefit from checking that the first characters match before calling the full strcmp.
It would also make rehashing easier since you already have the full key available to you and don't have to use your internal set function to move entries into the new table. In the posted implementation the big-O semantics of a rehash are worst case.
Anyways.. "man hsearch.3" if you want something cheap and easy.
One at a time per program, having to know how big the table needs to be at initialization time with no automatic resizing, an API that's very limited (no way to remove entries, iterate over entries, tell how many entries are in the table, ...)
Reentrant versions have existed for quite some time.
> having to know how big the table needs to be at initialization time with no automatic resizing
The posted implementation resizes by creating an entirely new table and then moving all entries. The exact same mechanism is available with hsearch.
> no way to remove entries, iterate over entries
Strong downside but the posted implementation has no removal function either. It also leaks key memory on resize and free. It's iterator would need modification to allow delete during iteration.
> tell how many entries are in the table
somewhat_reasonable_point++;
$ echo abcdef |tr abc ghi
ghidef
For an eight-bit character set, I found that building an array to map every character improved on linear search, even for short replacement strings and relatively short input strings.There isn't as easy a win for Unicode, so I played with some simple hash tables. Although the conventional wisdom is to use the high-order bits of a hash function's output, FNV-1a is not so good for short inputs of one or two bytes. (I used the 32-bit variant.) It was better just to use the low-order bits.
I know size_t wraps around on overflow so this would actually work, but this still makes me do a double take.
The whole thing was perhaps 200 lines of straightforward code. If you don't need to carry around support for all the third-prime-numbered-Tuesday-of-the-month scenarios that general purpose libraries have to, you end up with code you can understand fully and don't have to spend time reasoning about complexity that doesn't exist.
(So ... which takes longer, writing the data structure you want, as we all did then, or searching for the perfect library, as some people do now?)
EDIT: awk(1) had built-in hash tables, and dates from 1977. Note that generic hashing becomes much more useful after D-space gets larger than 16-64k! (14-16 bits of address)
Whenever I write an allocation I try to write a free after it, then fill in the code in between.
Use of static analysers and sanitisers helps too.
Like some people enjoy making all their hand tools from scratch in their workshop, there's a craft and discipline to follow and a satisfaction in doing things well, I enjoy the craft of and discipline C for the same reason.
https://github.com/webd90kb/webd/tree/master/codes/c_project...
https://github.com/webd90kb/webd/tree/master/codes/scripts/e...
I am fairly certain that in C it's actually impossible to have an object whose size is larger than half of the range of size_t unless ptrdiff_t is wider than size_t, which normally isn't. Unless, of course, C standard decided to make subtracting two valid pointers into the same array (or one past the end) a potential UB, just because.
> If the result is not representable in an object of that type, the behavior is undefined. In other words, if the expressions P and Q point to, respectively, the i-th and j-th elements of an array object, the expression (P)-(Q) has the value i - j provided the value fits in an object of type ptrdiff_t.
How to implement a hash table in C - https://news.ycombinator.com/item?id=26590234 - March 2021 (156 comments)
or even better, a full fledged linear probing hash lookup instruction.
"void* ht_get(...)"
Wait. What? A void pointer? Interesting... I have no clue.
I like articles like these. For someone not familiar with C it's a perfect level. In terms of explanation and the code itself.
The main use case is generic functions. And in data structures like this.
(And you don't need casts when converting between void pointers and data pointers)
I remember back in the late 90s to early 00s, I had fun countering OOP guys claiming C "cannot do XYZ" - but it obviously can.. the C way.
For method override, my mind was blown when I discovered function pointers!
It was my "Wait. What?" moment.. along with various other things!
Below is sample code (not tested)
// generic monster "class"
struct monster_t {
int health;
struct weapon_t *weapon;
// etc..
void (*attack)(struct monster_t *self); // our function pointer
// more function pointers for update, die, hit, etc...
};
// impl
void monster_notassigned_attach(struct monster_t *monster) { // safety net (todo)
printf("Attack not assigned to monster...\n");
}
void monster_dragon_attack(struct monster_t *monster) {
// attack for dragon
printf("Dragon is attacking you...\n");
}
void monster_wizard_attack(struct monster_t *monster) {
// attack for wizard
printf("Wizard is attacking you...\n");
}
// creation
void monster_dragon_create(struct monster_t *monster) {
// allocate, then assign.. and do...
monster->attack = monster_dragon_attack;
}
void monster_wizard_create(struct monster_t *monster) {
// allocate, then assign.. and do...
monster->attack = monster_wizard_attack;
}
void main() {
// testing...
struct monster_t monsters[10]; // can store various monsters!
monster_dragon_create(&monsters[0]);
monster_wizard_create(&monsters[1]);
for(int i = 0; i < 2; ++i) {
monsters[i].attack(&monsters[i]); // yeah you have to pass it in...
}
}
Not suggesting I support or follow the above especially for games. It is just a proof of concept!
I always tell people that coding in C (or any non-OOP language) should not be done the way you would in Java, C#, etc. As mentioned, the above is a demonstration.And yeah, a void * is how you express pointer to data of unknown type in C.
Does it matter? I tend to read it as "haven't touched it/had any use for it since university".