So one way to read C "gibberish" is to ignore the type at the beginning and parse the rest as an expression like a normal parse tree. First we take foo. Then we dereference it (so foo is a pointer). Next we call it as a function with no arguments (so foo is a pointer to a function that takes no arguments). Next, we dereference it again. Then we index into the result as an array. Finally, we reach the end, so we look at what the declared type and find that this type is an int. So foo is a pointer to a function that takes no arguments and returns a pointer to an array of 3 ints.
You can also use this to go backwards. What's the syntax for a function that takes an integer argument returns a pointer to an array of function pointers taking no arguments and returning integers? Well, we want to take foo, call it, dereference it, then index into an array, then dereference it again, then call it again, then return an int. Or int (* (* (foo)(int))[5])(void).
Ada: type Ret_Typ is array (1..3) of Integer; Foo : access function return not null access Ret_Typ := null;
C: int ((foo)(const void *))[3]
Cdecl: declare foo as pointer to function (pointer to const void) returning pointer to array 3 of int
foo: fn() -> Box<[i32; 3]>
Alternately, if the pointer is into static memory and not something allocated on the heap: foo: fn() -> &'static <[i32; 3]>;
That's pretty nice to look at and not too hard to read. In my opinion, for commonly used syntax (like fn decls), some well-chosen punctuation marks (', ->, :, in this case) are often boon to readability compared to keywords. So I think the Rust syntax in this case is nicer than Ada's.But in any case, while complicated C declarations may be uglier and take more effort to read than those in other languages, they are at least tractable once you learn the trick of "declaration follows use" and working backwards as GP describes.
Separately, though, what do you mean by your "gimmick" comment?
1. The syntax isn't "type id, id, id;", it's "type expr, expr, expr;" The trend for C-style languages have been to move to the former type syntax, so C/C++ is the anomaly here.
2. Pointer declarators show up to the left of the name while function and array declarators show up to the right of the name. This means you can't figure out the type by scanning in one direction. Contrast this with LLVM, where function arguments and pointer types both go to the right of the leaf type (while arrays are infix), or Rust, where they both live on the left of the leaf type.
It is notable that the chapter in K&R which discusses declarations also presents a partial version of the cdecl program and one of the exercises is for the reader to complete it --- really helping to dispel the notion that compilers are not mysterious magic. In my experience, it's rare for an introductory book on a programming language to also contain such "hints" on how it could be implemented.
This could be fixed. Instead of "void (* f)(int * x)" we would write "void (* f)(x &int)". Now it makes sense, the declaration says that we could call the function if we pass the address of some int y, as if by "(* f)(&y)". The specific syntax "x &int" says that the address of an int is x, the same way as "int * x" says that dereferenced x is an int.
What about "void (* f)(int x[10])" (pretending arrays could actually be passed)? With the pointer we relied on the existing opposite of the dereference operator, but there is nothing like that for arrays, that would make an array out of an element. Let's look to Python for inspiration, where the expression "[y]* N" will make a list of N elements with the value y. This gives us: "void (* f)(x [int]* N)". See how the declaration tells us that we could call the function using "(* f)([y]* N)" for some int y.
There's one more we need to solve: "void (* f)(void (* g)(int))". Since the parameter g of * f is a function pointer, we need to pass the address of a function, so clearly & will be involved. But we need a function to take the address of, and we don't have any available. Inspired by the C++ lambda syntax, let's invent function conjuration: "(Args) -> Ret" is an expression that conjures a function taking Args and returning Ret. Hence the solution: "void (* f)(g &(int) -> void)". It says that you could write "(* f)(&(int) -> void)", to call * f with the address of a conjured function taking an int and returning void.
We do need to be aware that the syntax for arguments in function conjuration expressions is the same as in top-level declarations. So we would need to rewrite "void (* f)(void (* g)(void (* h)(int * x)))" as "void (* f)(g &(void (* h)(x &int)) -> void)". So for each function pointer, its arguments must be declared in the other declaration mode.
Since this makes no sense at all, we have to conclude that the original C declaration syntax forms needs to be deprecated and only the newly invented syntax forms should be used.
x ∫ (int * x)
x &∫ (int * * x)
f &(x &int) -> void; (void (* f)(int * x))
f &(x [int]* 10) -> void; (void (* f)(int x[10]))
The new syntax can also be used for function declarations: main (argc int, argv [&char]*?) -> int
{
return 0;
}
See how we've invented a different declaration syntax (some sort of dual of C's current syntax), that actually respects "declaration-mirrors-use" better than C does and makes much more sense to humans.Everywhere else, you change C's declaration order of <declaration-specifier> <declarator>, in your new syntax to place the identifier of the declarator first, followed by any pointer ops, and lastly the type. You are changing the pointer op "" from a prefix that needed to be read right-to-left, after locating the identifier of the declarator, into a suffix "&" following the identifier, to be read left-to-right.
I agree that your change to left-to-right declaration order is definitely more readable.
2) But in your array syntax, borrowed from Python, the type is placed inside the array brackets, which used to hold the constant-expression denoting the array size. The array size is moved from within the brackets to be last, instead of the type being last, as in all your other syntax "rules". So, for arrays, the declaration syntax no longer reads simply left-to-right, since type is between declarator identifier and array size.
Wouldn't this be clearer, to have the type last and the constant-expression remain inside the array brackets? C syntax: (void ( f)(int x[10]))
use this instead for your new C syntax: f &(x [10] int) -> void;
3) I have a similiar problem with your function syntax:
instead of:
main (argc int, argv [&char]*?) -> int { return 0; }
why not put the type last, so as to be consistent with all your other syntax?
main (argc int, argv [] &char]) -> int { return 0; }
This is how the Go programming language does it, except for the preceding "func" reserved word and "string" in place of pointer to char: func main(argc int, argv [] string) int ...
5) The biggest problem I have is with adding "C++ lambda syntax" to C, to solve the problem of passing a function as actual parameter argument. That would mean you have 2 styles of pointers, one as a prefix and one as a suffix to the declarator identifier. So you now have to read both right-to-left and left-to-right, which seems to cancel out the benefits of only reading declarations in left-to-right order!
Would it be simpler, and preserve left-to-right declaration order, to provide a FunctionType as in the Go programming language? A parameter that is passed a function as argument is declared to have a FunctionType. Pointers to function are not apparently needed, at least not at the user level.
6) Q: How do these proposed changes affect the parsing of the new C syntax? Current C syntax can be parsed with predictive, non-backtracking parsers, in linear-time. I don't want to use backtracking, GLR, or other complex methods, if they are avoidable. At least C can now be parsed with with Yacc or Bison. (See A13 Grammar in K&R, "The C Programming Language" or Jacques-Henri Jourdan, François Pottier "A Simple, Possibly Correct LR Parser for C11")
Great explanation though, it really helps to read things inside-out
The declaration "int bar[3];" is an array of 3 ints, which are bar[0], bar[1] and bar[2]. Declaration mimics use but it's not exactly the same; in this case the size replaces the indices, which are all less than it.
It used to be a shared host with a PHP script shelling out to the cdecl executable, written in K&R C. Now it's that same executable running on AWS Lambda.
Yes Lambda really will run arbitrary ELF binaries.
Could you run it as a preprocessor macro?
Knuth does not use C in his books.
std::funtion<foo(bar, baz)>
into foo(*)(bar, baz)
? int *pi;
Means that when I dereference the variable pi, I get an int. This also explains why int *pi, i;
declares `pi` as a pointer to `int` and `i` as an `int`. From this point of view it makes sense stylistically to put * near the variable.Declaration of array types is similar. For example,
int arr[10];
means that when I take an element of `arr`, I obtain an `int`. Hence, `arr` is an array of ints.Pointers to functions work the same way. For example,
int (*f)(char, double);
means that if I dereference the variable `f` and I evaluate it on a `char` and on a `double`, then I get an `int`. Hence, the type of `f` is "pointer to function which takes as arguments a char and a double and returns an int".((void(*)(void))0)();
C has the weird thing that a literal zero in a pointer context becomes a null-pointer which may not actually have the bit pattern 0. And when the optimizer sees a guaranteed null pointer it tends to optimize the whole branch away since entering that would be UB. So if you try calling address zero with "((void()(void))0)();" you might end up with the whole function optimized away as well as any function it gets inlined into.
got: bad character '"'...apostrophe instead of double quote has the same result...well, I guess I expected too much