Cdecl – Turns English phrases into C declarations (opens in new tab)

(cdecl.org)

150 pointsdammitcoetzee7y ago50 comments

50 comments

35 comments · 13 top-level

jcranmer7y ago· 8 in thread

Here's an easy way to understand how these things work: in C, the type of a pointer/function/array mess is declared by how it's used. For a declaration like "int ( * ( * foo)(void))[3]", you can read it as "for a variable foo, after computing the expression ( * ( * foo)(void))[3], the result is an int."

So one way to read C "gibberish" is to ignore the type at the beginning and parse the rest as an expression like a normal parse tree. First we take foo. Then we dereference it (so foo is a pointer). Next we call it as a function with no arguments (so foo is a pointer to a function that takes no arguments). Next, we dereference it again. Then we index into the result as an array. Finally, we reach the end, so we look at what the declared type and find that this type is an int. So foo is a pointer to a function that takes no arguments and returns a pointer to an array of 3 ints.

You can also use this to go backwards. What's the syntax for a function that takes an integer argument returns a pointer to an array of function pointers taking no arguments and returning integers? Well, we want to take foo, call it, dereference it, then index into an array, then dereference it again, then call it again, then return an int. Or int (* (* (foo)(int))[5])(void).

exitcode007y ago

How about just using Ada? It has the added bonus of not being a gimmick (depending on who you ask I suppose ; )

Ada: type Ret_Typ is array (1..3) of Integer; Foo : access function return not null access Ret_Typ := null;

C: int ((foo)(const void *))[3]

Cdecl: declare foo as pointer to function (pointer to const void) returning pointer to array 3 of int

tomjakubowski7y ago

In the interest of furthering annoying language smuggery, the rough Rust equivalent:

    foo: fn() -> Box<[i32; 3]>

Alternately, if the pointer is into static memory and not something allocated on the heap:

    foo: fn() -> &'static <[i32; 3]>;

That's pretty nice to look at and not too hard to read. In my opinion, for commonly used syntax (like fn decls), some well-chosen punctuation marks (', ->, :, in this case) are often boon to readability compared to keywords. So I think the Rust syntax in this case is nicer than Ada's.

But in any case, while complicated C declarations may be uglier and take more effort to read than those in other languages, they are at least tractable once you learn the trick of "declaration follows use" and working backwards as GP describes.

Separately, though, what do you mean by your "gimmick" comment?

2 more replies

jcranmer7y ago

I'm not defending C's syntax as sane here, because it's not. It boils down to have two problems:

1. The syntax isn't "type id, id, id;", it's "type expr, expr, expr;" The trend for C-style languages have been to move to the former type syntax, so C/C++ is the anomaly here.

2. Pointer declarators show up to the left of the name while function and array declarators show up to the right of the name. This means you can't figure out the type by scanning in one direction. Contrast this with LLVM, where function arguments and pointer types both go to the right of the leaf type (while arrays are infix), or Rust, where they both live on the left of the leaf type.

1 more reply

userbinator7y ago

The other important part is to remember that precedence follows the same precedence as ordinary expressions, i.e. array subscripting and function call have higher precedence than pointer dereference.

It is notable that the chapter in K&R which discusses declarations also presents a partial version of the cdecl program and one of the exercises is for the reader to complete it --- really helping to dispel the notion that compilers are not mysterious magic. In my experience, it's rare for an introductory book on a programming language to also contain such "hints" on how it could be implemented.

ambrop77y ago

However the declaration-mirrors-use idea does not apply to function arguments. If you have "void (* f)(int * arg)", you would not use it like "(* f)(* arg)" unless your arg is actually "int * * ".

This could be fixed. Instead of "void (* f)(int * x)" we would write "void (* f)(x &int)". Now it makes sense, the declaration says that we could call the function if we pass the address of some int y, as if by "(* f)(&y)". The specific syntax "x &int" says that the address of an int is x, the same way as "int * x" says that dereferenced x is an int.

What about "void (* f)(int x[10])" (pretending arrays could actually be passed)? With the pointer we relied on the existing opposite of the dereference operator, but there is nothing like that for arrays, that would make an array out of an element. Let's look to Python for inspiration, where the expression "[y]* N" will make a list of N elements with the value y. This gives us: "void (* f)(x [int]* N)". See how the declaration tells us that we could call the function using "(* f)([y]* N)" for some int y.

There's one more we need to solve: "void (* f)(void (* g)(int))". Since the parameter g of * f is a function pointer, we need to pass the address of a function, so clearly & will be involved. But we need a function to take the address of, and we don't have any available. Inspired by the C++ lambda syntax, let's invent function conjuration: "(Args) -> Ret" is an expression that conjures a function taking Args and returning Ret. Hence the solution: "void (* f)(g &(int) -> void)". It says that you could write "(* f)(&(int) -> void)", to call * f with the address of a conjured function taking an int and returning void.

We do need to be aware that the syntax for arguments in function conjuration expressions is the same as in top-level declarations. So we would need to rewrite "void (* f)(void (* g)(void (* h)(int * x)))" as "void (* f)(g &(void (* h)(x &int)) -> void)". So for each function pointer, its arguments must be declared in the other declaration mode.

Since this makes no sense at all, we have to conclude that the original C declaration syntax forms needs to be deprecated and only the newly invented syntax forms should be used.

  x &int;   (int * x)
  x &&int;   (int * * x)
  f &(x &int) -> void;   (void (* f)(int * x))
  f &(x [int]* 10) -> void;   (void (* f)(int x[10]))

The new syntax can also be used for function declarations:

  main (argc int, argv [&char]*?) -> int
  {
      return 0;
  }

See how we've invented a different declaration syntax (some sort of dual of C's current syntax), that actually respects "declaration-mirrors-use" better than C does and makes much more sense to humans.

watergatorman7y ago

1) The use of the Python feature for arrays I find confusing as it is not orthogonal to the rest of your new and improved syntax for C.

Everywhere else, you change C's declaration order of <declaration-specifier> <declarator>, in your new syntax to place the identifier of the declarator first, followed by any pointer ops, and lastly the type. You are changing the pointer op "" from a prefix that needed to be read right-to-left, after locating the identifier of the declarator, into a suffix "&" following the identifier, to be read left-to-right.

I agree that your change to left-to-right declaration order is definitely more readable.

2) But in your array syntax, borrowed from Python, the type is placed inside the array brackets, which used to hold the constant-expression denoting the array size. The array size is moved from within the brackets to be last, instead of the type being last, as in all your other syntax "rules". So, for arrays, the declaration syntax no longer reads simply left-to-right, since type is between declarator identifier and array size.

Wouldn't this be clearer, to have the type last and the constant-expression remain inside the array brackets? C syntax: (void (

f)(int x[10]))

use this instead for your new C syntax: f &(x [10] int) -> void;

3) I have a similiar problem with your function syntax:

instead of:

main (argc int, argv [&char]*?) -> int { return 0; }

why not put the type last, so as to be consistent with all your other syntax?

main (argc int, argv [] &char]) -> int { return 0; }

This is how the Go programming language does it, except for the preceding "func" reserved word and "string" in place of pointer to char: func main(argc int, argv [] string) int ...

5) The biggest problem I have is with adding "C++ lambda syntax" to C, to solve the problem of passing a function as actual parameter argument. That would mean you have 2 styles of pointers, one as a prefix and one as a suffix to the declarator identifier. So you now have to read both right-to-left and left-to-right, which seems to cancel out the benefits of only reading declarations in left-to-right order!

Would it be simpler, and preserve left-to-right declaration order, to provide a FunctionType as in the Go programming language? A parameter that is passed a function as argument is declared to have a FunctionType. Pointers to function are not apparently needed, at least not at the user level.

6) Q: How do these proposed changes affect the parsing of the new C syntax? Current C syntax can be parsed with predictive, non-backtracking parsers, in linear-time. I don't want to use backtracking, GLR, or other complex methods, if they are avoidable. At least C can now be parsed with with Yacc or Bison. (See A13 Grammar in K&R, "The C Programming Language" or Jacques-Henri Jourdan, François Pottier "A Simple, Possibly Correct LR Parser for C11")

2 more replies

Hex087y ago

*An array of 4 ints

Great explanation though, it really helps to read things inside-out

ramshorns7y ago

Which part are you correcting?

The declaration "int bar[3];" is an array of 3 ints, which are bar[0], bar[1] and bar[2]. Declaration mimics use but it's not exactly the same; in this case the size replaces the indices, which are all less than it.

ridiculous_fish7y ago· 4 in thread

Hey, this is my site, first published 2009! This is the venerable cdecl enhanced with blocks support.

It used to be a shared host with a PHP script shelling out to the cdecl executable, written in K&R C. Now it's that same executable running on AWS Lambda.

Yes Lambda really will run arbitrary ELF binaries.

buboard7y ago

Didnt know what blocks are , it seems it's an apple extension.

saagarjha7y ago

Yup, they’re an extension to C/Objective-C/C++ implemented in Clang: https://en.m.wikipedia.org/wiki/Blocks_(C_language_extension...

erroneousboat7y ago

Thanks for creating it, it really helps with learning C.

kitd7y ago

Nice work.

Could you run it as a preprocessor macro?

valerij7y ago· 2 in thread

on topic of function pointers, is there a template to turn

  std::funtion<foo(bar, baz)>

into

  foo(*)(bar, baz)

?

bartbes7y ago

Sure. Here's one: https://godbolt.org/z/vWl4NE

valerij7y ago

huh. this was easier than expected

mey7y ago· 2 in thread

Need this for bash and by proxy regex.

bewuethr7y ago

There is https://explainshell.com - not Bash specific, though.

drewsberry7y ago

There's https://regex101.com/ for regex, does an excellent job, can't really ask for more

Jerry27y ago· 2 in thread

Tried it on this gibberish but it complains about syntax:

((void(*)(void))0)();

poizan427y ago

Besides not being a declaration it really is gibberish - calling a null pointer is undefined behaviour. I believe the correct way of calling a function at address zero is ((void()(void))(intptr_t)0)(); which is merely implementation defined.
C has the weird thing that a literal zero in a pointer context becomes a null-pointer which may not actually have the bit pattern 0. And when the optimizer sees a guaranteed null pointer it tends to optimize the whole branch away since entering that would be UB. So if you try calling address zero with "((void(
)(void))0)();" you might end up with the whole function optimized away as well as any function it gets inlined into.

tntn7y ago

That's not a declaration.

TorKlingberg7y ago· 1 in thread

I've been a professional C programmer for years, but I rarely find cdecl useful (command line or website). Not because complex C declarations are intuitive to me, but because cdecl fails on any unknown types. Real world C code is full of typedefs.

kitd7y ago

Could you not substitute in a known type, get the result and insert the unknown type back in afterwards?

Eli_P7y ago· 1 in thread

If I recall correctly, this one came as an exercise in Knuth's book of C programming, ibidem were C declarations and priorities explained.

userbinator7y ago

The K in K&R is for (Brian) Kernighan, not Knuth.

Knuth does not use C in his books.

pkaye7y ago· 1 in thread

Its better to create a series of typedef and build up the declaration. Most of the time you need those sub typedef anyway.

nwmcsween7y ago

Extremely sparingly, typedefs like in glib are a nightmare and just arbitrary typedefs like char -> char_t are just useless

unnouinceput7y ago· 1 in thread

tried: declare xxx as integer pointer to array of string equal to "mumu" and "kaka"

got: bad character '"'...apostrophe instead of double quote has the same result...well, I guess I expected too much

ComputerGuru7y ago

You are mixing type declarations and values. Foo equals bar is not a constraint that can be specified via the type system (generally speaking).

saagarjha7y ago

For all of its simplicity, the syntax for complex types in C is pretty horrible. Yes, I know the "inside out" rule, and can usually read these, but that doesn't make it any less bad.

nurettin7y ago

Back in early 2000s, we had bots on IRC doing this. My favorite technique was to pass the type to a template function, assign it to an integer and then parse the compile time error produced by gcc to extract the type.

SidiousL7y ago

The actual principle behind the C type declarations is "declaration follows use". Let me explain what this means. Take this declaration

   int *pi;

Means that when I dereference the variable pi, I get an int. This also explains why

   int *pi, i;

declares `pi` as a pointer to `int` and `i` as an `int`. From this point of view it makes sense stylistically to put * near the variable.

Declaration of array types is similar. For example,

   int arr[10];

means that when I take an element of `arr`, I obtain an `int`. Hence, `arr` is an array of ints.

Pointers to functions work the same way. For example,

   int (*f)(char, double);

means that if I dereference the variable `f` and I evaluate it on a `char` and on a `double`, then I get an `int`. Hence, the type of `f` is "pointer to function which takes as arguments a char and a double and returns an int".

skookumchuck7y ago

The way to make complex C declarations legible is to use typedefs for the subtypes (like function pointers).

j / k navigate · click thread line to collapse

50 comments

35 comments · 13 top-level

jcranmer7y ago· 8 in thread

exitcode007y ago

How about just using Ada? It has the added bonus of not being a gimmick (depending on who you ask I suppose ; )

Ada: type Ret_Typ is array (1..3) of Integer; Foo : access function return not null access Ret_Typ := null;

C: int ((foo)(const void *))[3]

Cdecl: declare foo as pointer to function (pointer to const void) returning pointer to array 3 of int

tomjakubowski7y ago

In the interest of furthering annoying language smuggery, the rough Rust equivalent:

    foo: fn() -> Box<[i32; 3]>

Alternately, if the pointer is into static memory and not something allocated on the heap:

    foo: fn() -> &'static <[i32; 3]>;

Separately, though, what do you mean by your "gimmick" comment?

2 more replies

jcranmer7y ago

I'm not defending C's syntax as sane here, because it's not. It boils down to have two problems:

1. The syntax isn't "type id, id, id;", it's "type expr, expr, expr;" The trend for C-style languages have been to move to the former type syntax, so C/C++ is the anomaly here.

1 more reply

userbinator7y ago

The other important part is to remember that precedence follows the same precedence as ordinary expressions, i.e. array subscripting and function call have higher precedence than pointer dereference.

ambrop77y ago

However the declaration-mirrors-use idea does not apply to function arguments. If you have "void (* f)(int * arg)", you would not use it like "(* f)(* arg)" unless your arg is actually "int * * ".

Since this makes no sense at all, we have to conclude that the original C declaration syntax forms needs to be deprecated and only the newly invented syntax forms should be used.

  x &int;   (int * x)
  x &&int;   (int * * x)
  f &(x &int) -> void;   (void (* f)(int * x))
  f &(x [int]* 10) -> void;   (void (* f)(int x[10]))

The new syntax can also be used for function declarations:

  main (argc int, argv [&char]*?) -> int
  {
      return 0;
  }

watergatorman7y ago

1) The use of the Python feature for arrays I find confusing as it is not orthogonal to the rest of your new and improved syntax for C.

I agree that your change to left-to-right declaration order is definitely more readable.

Wouldn't this be clearer, to have the type last and the constant-expression remain inside the array brackets? C syntax: (void (

f)(int x[10]))

use this instead for your new C syntax: f &(x [10] int) -> void;

3) I have a similiar problem with your function syntax:

instead of:

main (argc int, argv [&char]*?) -> int { return 0; }

why not put the type last, so as to be consistent with all your other syntax?

main (argc int, argv [] &char]) -> int { return 0; }

This is how the Go programming language does it, except for the preceding "func" reserved word and "string" in place of pointer to char: func main(argc int, argv [] string) int ...

2 more replies

Hex087y ago

*An array of 4 ints

Great explanation though, it really helps to read things inside-out

ramshorns7y ago

Which part are you correcting?

ridiculous_fish7y ago· 4 in thread

Hey, this is my site, first published 2009! This is the venerable cdecl enhanced with blocks support.

It used to be a shared host with a PHP script shelling out to the cdecl executable, written in K&R C. Now it's that same executable running on AWS Lambda.

Yes Lambda really will run arbitrary ELF binaries.

buboard7y ago

Didnt know what blocks are , it seems it's an apple extension.

saagarjha7y ago

Yup, they’re an extension to C/Objective-C/C++ implemented in Clang: https://en.m.wikipedia.org/wiki/Blocks_(C_language_extension...

erroneousboat7y ago

Thanks for creating it, it really helps with learning C.

kitd7y ago

Nice work.

Could you run it as a preprocessor macro?

valerij7y ago· 2 in thread

on topic of function pointers, is there a template to turn

  std::funtion<foo(bar, baz)>

into

  foo(*)(bar, baz)

?

bartbes7y ago

Sure. Here's one: https://godbolt.org/z/vWl4NE

valerij7y ago

huh. this was easier than expected

mey7y ago· 2 in thread

Need this for bash and by proxy regex.

bewuethr7y ago

There is https://explainshell.com - not Bash specific, though.

drewsberry7y ago

There's https://regex101.com/ for regex, does an excellent job, can't really ask for more

Jerry27y ago· 2 in thread

Tried it on this gibberish but it complains about syntax:

((void(*)(void))0)();

poizan427y ago

tntn7y ago

That's not a declaration.

TorKlingberg7y ago· 1 in thread

kitd7y ago

Could you not substitute in a known type, get the result and insert the unknown type back in afterwards?

Eli_P7y ago· 1 in thread

If I recall correctly, this one came as an exercise in Knuth's book of C programming, ibidem were C declarations and priorities explained.

userbinator7y ago

The K in K&R is for (Brian) Kernighan, not Knuth.

Knuth does not use C in his books.

pkaye7y ago· 1 in thread

Its better to create a series of typedef and build up the declaration. Most of the time you need those sub typedef anyway.

nwmcsween7y ago

Extremely sparingly, typedefs like in glib are a nightmare and just arbitrary typedefs like char -> char_t are just useless

unnouinceput7y ago· 1 in thread

tried: declare xxx as integer pointer to array of string equal to "mumu" and "kaka"

got: bad character '"'...apostrophe instead of double quote has the same result...well, I guess I expected too much

ComputerGuru7y ago

You are mixing type declarations and values. Foo equals bar is not a constraint that can be specified via the type system (generally speaking).

saagarjha7y ago

For all of its simplicity, the syntax for complex types in C is pretty horrible. Yes, I know the "inside out" rule, and can usually read these, but that doesn't make it any less bad.

nurettin7y ago

SidiousL7y ago

The actual principle behind the C type declarations is "declaration follows use". Let me explain what this means. Take this declaration

   int *pi;

Means that when I dereference the variable pi, I get an int. This also explains why

   int *pi, i;

declares `pi` as a pointer to `int` and `i` as an `int`. From this point of view it makes sense stylistically to put * near the variable.

Declaration of array types is similar. For example,

   int arr[10];

means that when I take an element of `arr`, I obtain an `int`. Hence, `arr` is an array of ints.

Pointers to functions work the same way. For example,

   int (*f)(char, double);

skookumchuck7y ago

The way to make complex C declarations legible is to use typedefs for the subtypes (like function pointers).

j / k navigate · click thread line to collapse