#ifndef BD9DF82A4540BB19368E48E4747C0706
#define BD9DF82A4540BB19368E48E4747C0706
Does some IDE do that for you? Did you just pick a random sequence? Why not just a variant of #ifndef _salmagundi_h that is more common?Other than that, I strongly echo other's advice about avoiding so many mallocs.
i#ifndef __foo_h <escape>yyp:s/ifndef/define/<enter>o<enter>#endif<escape>
(void)k_sz;
doing? I've seen fussy people doing: (void) printf( "hello" );
but not what you have written. As I say, probably ignorance on my part.It's only supported by gcc 11+ and clang 11+ (both from ~2021) because it was a suggested C2x extension. It's been accepted in C23 so other compilers will eventually support it, but I wouldn't hold my breath regarding when that happens.
Having a fuzz test was in-va-lu-ab-le. I caught several bugs right there. The delete function was particularly tricky.
I ended up with two fuzz tests as my hash table has a key feature: convergence. Having same contents, it would have exactly same bits in the buffer. In other words, it is independent of the insertion/deletion order. For this, I added another fuzz test. I would add a third one if I realize there is an important invariant I did not fuzz test. That is not much work, but so much useful!
https://github.com/gritzko/librdx/blob/master/abc/fuzz/HASH....
https://github.com/gritzko/librdx/blob/master/abc/fuzz/HASHd...
P.S. In your case, I recommend clang's libfuzzer with ASAN.
https://github.com/FRRouting/frr/blob/master/tests/lib/test_...
(this file is #include'd from another, with some #define set up:)
https://github.com/FRRouting/frr/blob/master/tests/lib/test_...
The "tribe" of C developers with distaste for the preprocessor will absolutely hate how this is written. But it works and caught real issues. It doesn't use an external fuzzing tool but rather a deterministic PRNG (see prng_* calls) to get a good mix of call combinations. Coverage analysis was done to make sure it reaches everywhere.
(disclaimer: I wrote this. And it did catch bugs. Also it really should have a lot of comments.)
https://github.com/c-blake/bst/blob/main/test/bst-interact.c
Essentially, you just write a very simple (some might say trivial) interactive command-shell that can include an auto-checking upon edit of (in that case tree) invariants keyed-off an environment variable. The only missing piece is some little tool to generate random inserts/deletes/etc.If the micro-shell has a pretty printer then it is easy to "replay" any failed series of edits with some "p" commands before & after invariant failures.
https://craftinginterpreters.com/hash-tables.html#deleting-e...
Additionally, in case of overwrite, the memory previously allocated for the key is leaked.
All in all, I don't think row 100 is really needed, and removing it wouldn't do any harm.
Finally, deletes are hard in hash maps. Either you decide that deletes are not allowed, or you implement them with tombstones or re-insertions. A half-backed solution is not an option.
Hope these suggestions help you.
Deletes are hard in open hash maps like here; they're trivial in chained hash maps.
(I'm only posting this because this seems to be an exercise/learning-ish HN post and I think it's a relevant point to not forget.)
Thinking about it, I don't grasp why the author has opted for linear probing here. Given the amount of malloc(), they might have used linked lists as well.
You can corrupt memory in such a way that allows executing arbitrary code, in which you could do anything that the process has privilege to do, which is probably reading and writing files as the current user and a lot more.
Algorithm for hash collisions is just to find the next empty slot (linear probing)? What happens when the original entry is removed, are the ones in subsequent slots cycled backwards? I don't see that...
Also the name is new to me! TIL 'salmagundi' is an English word (not even a loan-word) for a type of salad made of many ingredients: an excellent name for a hash map that can contain many effectively random items.
Many of my toy and hobby projects are exactly that. Some make allowances for the sake of performance and generality.
The hash map, though, is up there with sorting algorithms in the breadth of wild implementations. Salmagundi is unlikely to be the fastest, or the smartest. But it’s cute and portable.
I edited my comment with some more questions, btw, likely as you were answering - apologies if that ended up confusing.
https://www.nytimes.com/1978/01/16/archives/the-salmagundi-d...
First, a correctness question:
- if a struct contains padding, the value of these padding bytes is undefined. What happens if you use such a struct as a key?
And then some integration questions:
- how do you make this typesafe against mixing up 2 distinct uses of hashmaps in the application?
- how do you make this typesafe for its keys and values, i.e. remove accident-prone casts on read access?
- how do you not call the compare and hash functions through function pointers? (Calling through function pointers is both performance relevant as well as a target for exploitation by overwriting the pointer)
(None of these have "nice" answers. But thinking about them is IMHO extremely valuable since these are the most challenging in organizing and engineering an actual codebase itself.)
It's undefined for uninitialized objects. If I understand correctly, when an initializer is provided the padding bits are set to 0. Additionally, setting the whole chunk of memory to 0 w/ memset would work.
https://stackoverflow.com/a/37642061/24042444
> - how do you make this typesafe against mixing up 2 distinct uses of hashmaps in the application?
Lacking templates, the user of the library could create slim wrapper methods for each type (and coerce to void* to call the underlying hashmap lib); You could still accidentally call the wrong method but it would help a bit. I suppose you could wrap the real hashmap in a new type to help further (so each set of wrapper methods for a given type would be forced to use the same wrapper struct).
Alternatively, the library could provide helper macros to accomplish the same.
https://interrupt.memfault.com/blog/c-struct-padding-initial... looks at this (albeit a bit outdated; clang 13 and GCC 11) and claims to see undefined values even when initializers are given, at higher optimization levels
> Additionally, setting the whole chunk of memory to 0 w/ memset would work.
This seems to be the only thing that can be relied on to work (… sadly)
An alternate answer is: don't treat structs as chunks of bytes when the chunk crosses padding (i.e. create functions to hash specific structs)
Another alternate answer: remove the padding and add "_pad12[34]" members instead. But then you need to figure out where you actually have padding, which is architecture dependent…
In order to speed it up by reducing the number of malloc() calls, it may be worth adding a simple arena memory allocation measure: by allocating one larger block (e.g. 1 MB) initially and then doubling the memory size each time you run out, all malloc()/calloc() calls can become local salmagundi_alloc() calls that are just macro invocations that return an arena buffer pointer and also increment said pointer as a side effect.
I also recommend you have a look at Chris Hanson's book "C: Interfaces and Implementations", which has a few neat C API tricks that your code could benefit from (e.g. for reducing name space pollution, for avoiding null pointer argument errors, API method naming etc.).
Furthermore, a typical arena allocator only grows but doesn't shrink. A deleted item from the memory block will still hold the memory and will not be released. You will need a more sophisticated allocator for deletion. For fixed sized items, memory pool may be an option.
In the hm_put function, when you overwrite, why do you malloc and copy the key again, and what happens with the old key pointer and the old value pointer? (no free for the key and the memset zeros everything?)
Looks like the key logic is missing an overwrite case, the value is reallocated on overwrite.
Good catches :)
idx = (idx + 1) % map->cap;
becomes iterCount++;
idx = (idx + iterCount) % map->cap;Never change, HN.