Note that you save no memory by not naming the implicit (in this implementation) dummy node. A pointer to a pointer remains a pointer to a pointer whether you name the intermediate pointer or not. Personally I think it's more readable to name it. Also, I prefer curr != NULL to simply putting *curr in a condition context. The compiler is going to do exactly the same.
I don't favour Linus' version because it's not readable. Most people who haven't seen this before will need to take some time to check this code. Out of the three basic ways of doing this I'd place this one dead last.
Your argument is like trying to claim that twenty lines of string munging is more readable than a straightforward regex. That's only true if you don't really get regular expressions.
If you understood pointers well enough, you would see Linus's version as more readable. The other one has so much more unnecessary fluff that makes it extra complicated and less readable. If you don't understand pointers well enough, and "haven't seen this before", of course you'll find it confusing.
Readability is in the eye of the beholder.
Sounds like an argument for not using C in the first place. If you need to understand pointers so well that the code presented in this article looks readable just to write a clean implementation of remove_if, your language is seriously lacking in expressiveness.
Data structures are best reasoned about abstractly. A linked list is either null or a pair [data, next] where next is a linked list. remove_if(list, fn) returns null if list is null; if list is [data, next], remove_if returns next if fn(data) is true, otherwise [data, remove_if(next)]. These definitions are much easier to understand than code with two levels of indirection; they are also easily restated as code, without requiring so many levels of pointers.
To put it another way, the shorter the transition between a natural language definition of a data structure/operation and the source code, the better. Since it is almost never the case that a natural definition will involve pointers, pointers should be used sparingly in an implementation. Pointers are unintuitive, easy to make mistakes with, and have been the source of most of the serious bugs we have dealt with over the past few decades; when I see the Linux kernel panic, it is usually because of a problem with a pointer.
I would rather see a few extra lines of code that show that a programmer took the time to think about what they were doing than a "clever trick" that is hard to reason about and which is probably hiding some subtle bug.
I agree - and as someone who makes his living as a C programmer, I'm sure someone could easily come up with a terse piece of Ruby code that I would view as too clever, and therefore confusing, but which the vast majority of programmers on here would look at and think, "a beginner may be confused by this, but a competent Ruby programmer would not be."
I know some of this comes down to stylistic preferences, and likely some have had bad experiences maintaining code that went overboard with this sort of thing. But I also think that if these snippets of code were being discussed in a forum with a bias towards low-level development, rather than HN, which seems biased towards web development, you'd be seeing some very different responses.
As I said before, both ways are textbook basic singly linked list deletions. Linus' version does the dummy pointer implicitly. Generally doing things like that requires an extra moment to realise what's going on.
The smartass idea that "anything that is understandable is equally understandable" simply falls flat in the face of reality.
To claim one piece of code to be better than another you need a more scientific argument than "avoiding an if" if you add something else (an indirection) instead.
Speaking of the word clever, I think some are clutching that a little too proudly. In this context it doesn't mean "more skilled" or "more efficient" or "technically better", it simply means doing something in a particular way because you think it is unique. There is little to no argument to be made in favour of the approach in the linked blog entry, but there are many potential risks with it, not least of which is that it adds one more potential misunderstanding by skilled practitioners. While I love pointers to bits, and love unmanaged, native code, there is a good reason why most environments now trying to reduce their usage, even with the bestest of the bestest developers manning the keyboards.
That said, it's actually an excellent C interview question. Combined with "write a binary tree iterator" it reliably measures one's exposure to thoughtful C programming.
This was actually one of the very first questions we used to ask during interviews at Tenable. I'm honestly kind of amazed it's now seen as some sort of tricky technique. I would think (or hope) most C programmers would see it as basic competence with pointers.
Which is exactly what Torvald's was saying. All the comments here going "that's unreadable" or "it's too clever" scare me. And I don't even consider myself some C coding rockstar (took me a bit to sort out my own mental picture of pointers).
Then again, the 'rm' predicate may dominate, or removes may be infrequent, and then who cares? Or your CPU may not care about LHS stalls (by virtue of being very whizzy, or just so terrible that the optimization doesn't matter).
Which is to say, measure and be prepared to be flexible rather than statically clever.
My personal rule of thumb concerning pointers is that two-stars should never be seen except in the case where you want a function to be able to modify a pointer that was passed in as a parameter. That's it. If I get handled a module that has two-stars in it under any other circumstance, I refactor it. I do this because:
a) lots of people find it difficult to thing about pointers to pointers (although I've never understood why empirically I note that this is the case).
b) Having to dereference two pointers to get to the information that you want is not just slow because of extra work needing to be done, but also because you've just increased the need for doing a page change in memory, which on some architectures can be surprisingly expensive.
(Does anybody know how you type an asterisk?)
It certainly shouldn't. There's no difference at an architectural level between keeping track of prev and keeping track of &prev->next (as should be obvious, because they differ by a constant offset): advancing in the list is just a pointer update via a load either way, and removing a link is just a store.
Since they do the same thing, the opportunity for an LHS stall may be avoided in either approach with a little care.
Load-hit-stores are one of those pernicious things that bite you a couple times, then you start noticing them. They are indeed "architectural" to a number of processors common in video game consoles, where cycle stealers get a lot of sunlight.
void remove_if(node ** head, remove_fn rm)
{
node * entry = *head;
if (rm(entry))
{
*head = entry->next;
free(entry);
}
else
head = &entry->next;
remove_if(head, rm);
}
Ahh, much better: Now we can refactor: void remove_head(node ** head)
{
node * entry = *head;
*head = entry->next;
free(entry);
}
void remove_if(node ** head, remove_fn rm)
{
if (rm(*head))
remove_head(head)
else
head = &entry->next;
remove_if(head, rm);
}
I'm sure GCC will be able to optimize away the tail call.Edit: oops, I forgot to test for empty lists, which was implicit in the for loop I set out to remove…
node *remove_if(node *head, remove_fn rm)
{
if (rm(head)) {
return remove_if(head->next);
}
head->next = remove_if(head->next);
return head;
}
A good use of this technique (returning pointers to the resulting node) is the following code which very cleanly implements AVL trees in C (from http://infoarena.ro/multe-smenuri-de-programare-in-cc-si-nu-... -- scroll to the end (sorry, main article in Romanian)): #define max(a, b) ((a) > (b) ? (a) : (b))
#define geth(n) (n->h = 1 + max(n->l->h, n->r->h))
struct node
{
int key, h;
struct node *l, *r;
} *R, *NIL;
typedef struct node node;
void init(void)
{
R = NIL = (node *) malloc(sizeof(node));
NIL->key = NIL->h = 0,
NIL->l = NIL->r = NULL;
}
node* rotleft(node *n)
{
node *t = n->l;
n->l = t->r, t->r = n,
geth(n), geth(t);
return t;
}
node* rotright(node *n)
{
node *t = n->r;
n->r = t->l, t->l = n,
geth(n), geth(t);
return t;
}
node* balance(node *n)
{
geth(n);
if (n->l->h > n->r->h + 1)
{
if (n->l->r->h > n->l->l->h)
n->l = rotright(n->l);
n = rotleft(n);
}
else
if (n->r->h > n->l->h + 1)
{
if (n->r->l->h > n->r->r->h)
n->r = rotleft(n->r);
n = rotright(n);
}
return n;
}
node* insert(node *n, int key)
{
if (n == NIL)
{
n = (node *) malloc(sizeof(node));
n->key = key, n->h = 1, n->l = n->r = NIL;
return n;
}
if (key < n->key)
n->l = insert(n->l, key);
else
n->r = insert(n->r, key);
return balance(n);
}
node* erase(node *n, int key)
{
node *t;
if (n == NIL) return n;
if (n->key == key)
{
if (n->l == NIL || n->r == NIL)
{
t = n->l == NIL ? n->r : n->l;
free(n); return t;
}
else
{
for (t = n->r; t->l != NIL; t = t->l);
n->key = t->key,
n->r = erase(n->r, t->key);
return balance(n);
}
}
if (key < n->key)
n->l = erase(n->l, key);
else
n->r = erase(n->r, key);
return balance(n);
}
int search(node *n, int key)
{
if (n == NIL) return 0;
if (n->key == key) return 1;
if (key < n->key)
return search(n->l, key);
else
return search(n->r, key);
} avl_tree_rotate(avl, &t->left->right->left, &t->left->right, &t->left, t->left->right);
Where avl_tree_rotate looked like this (avl_tree_update just recomputes the height after left or right branch height changes): static void avl_tree_rotate(avl avl, V **from, V **to, V **parent, V *child) {
if(*from) (*from)->parent = *to ? (*to)->parent : NULL;
if(*to) (*to)->parent = *from ? (*from)->parent : NULL;
avl_tree_swap(avl, from, to);
child->parent = (*parent)->parent;
if(!((*parent)->parent))
avl->root = child;
else if((*parent)->parent->left == *parent)
(*parent)->parent->left = child;
else if((*parent)->parent->right == *parent)
(*parent)->parent->right = child;
(*parent)->parent = child;
*from = *parent;
avl_tree_update(avl, *parent);
avl_tree_update(avl, child);
}
It was one of nicer programming workouts I ever got (the whole class was a great experience and I think every professional programmer should go through implementing all the basic data structures and algorithms in plain C one time), but I never managed to get deletion right, despite spending quite some hours on it, it just went beyond my head. Not that it matters for linked lists, and in this case I made it deliberately harder on myself, but I think the moral of the story is that when you are too clever with implementation details, you might run out of your cleverness when starting to deal with the actual problem domain.That reminded me of this quote: "Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?"
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." --Brian Kernighan
Readability and maintainability trump developers showing how clever they are by far.
i just checked it and, yes, i used "two stars".
now i'm reading all the comments here, which are predominantly critical, calling this "clever" code. and it struck me just how lucky i am to work with decent programmers. where this kind of thing is considered quality work, not "showing off".
what i don't understand is why the other commenters here use pointers at all. couldn't you rewrite this to use arrays with integer indices? then you wouldn't need any stars (or very few), and it would be even "easier".
well, that's not true. i do understand. everyone thinks their own level is the correct one.
but still, i have sometimes been lucky enough to work with people who knew a lot more than i did. when i was in that position i didn't think they were "showing off". i thought they were awesome and tried to learn from them.
My rule of thumb is I can write “clever” code if I make sure it would be easy to debug for me or anyone else. This usually includes indented logging for anything complicated, good choice of variable names and plenty of debug assertions that would fail with a meaningful message if something went wrong.
When the deadline comes and you're dealing with an obscure bug in a clever code that looks right but is difficult to reason about, you'll be damning it.
How was this not easy code to write? I imagine it would have looked like this, but in your language of choice:
(defun remove-if (lst pred)
(if (null lst)
nil
(if (funcall pred (first lst))
(remove-if (rest lst) pred)
(cons (first lst) (remove-if (rest lst) pred))
)
)
)
This is straightforward and basically follows from the definition of a list: either null (empty) or a pair whose second element is a list. The C version is uglier because of the need to deallocate removed nodes, track pointers, etc., but it still follows from the definition of a list.it was c. i made some mistakes. i fixed them.
Also the code proposed may work for linked lists that is a very well understood topic, but code like this in general is hard to understand and debug, without actually providing some serious advantage.
I remember some code on the Linux kernel implementing the inode caching in the virtual file system layer that was a mess like that, but multiplied by 1000. Very "Smart" indeed, but for the sake of what?
It's more elegant then the double level pointer, imho.
Results with {gcc-4.7.2,clang-3.2} -march=corei7 -O3
build_onestar+remove_if_onestar: 0m0.309s 0m0.357s
build_onestar+remove_if_twostar: 0m0.302s 0m0.324s
build_twostar+remove_if_onestar: 0m0.310s 0m0.361s
build_twostar+remove_if_twostar: 0m0.302s 0m0.320s
The two-star version of remove_if() is a bit faster here, but both build() versions are the same, across both compilers.Perhaps this is a result of many people learning in languages like Java where the idea of double indirection isn't quite so explicit so this style feels somewhat unnatural?
I could easily imagine introducing very weird bugs by mixing up my pointers and pointers-to-pointers. To mentally visualise a linked list I find it easier to just imagine a stack of boxes attached by string and work on the basis of cutting and reattaching the string to different points, this is more difficult under the 2nd approach.
EDIT: I see it's also linked in the article.
Usually there are countless possibilities for the exact same thing in every program. Personally, I would never say those who produce code with unused possibilities do "not understand" the corresponding language features.
Especially when writing example code or working with less experienced programmers, I even think that using less language features for the sake of making your algorithmic idea as obvious as possible is better practice.
For exmapel, I find it far more important that one decides wisely when to use a linked list at all or, if there are multiple fuctions for removal, chooses the right one for a particular use-case. Therefore everyone has to understand whats happening.