Zinc: a low level language between assembler, C and C++ with Ruby-like syntax (opens in new tab)

(tibleiz.net)

116 pointstianyicui15y ago50 comments

50 comments

28 comments · 10 top-level

psnj15y ago· 12 in thread

I was surprised by my almost-panicky reaction to seeing:

  Identifiers Can Have Blanks

  open_window_with_attributes(...)
  becomes:
  open window with attributes (...)

I think I actually felt that wrongness in my stomach. Like a more intense version of seeing our corporate network shared drive's files with spaces and parens in them.

I guess I'm old.

scott_s15y ago

I had a similar reaction, and I'm not sure that it's a "damn kids, get off my lawn" reaction. Specifying an unambiguous grammar may be difficult - which implies parsing may become a problem.

An implementation exists, so the author has something working, but I'm wondering how robust the parsing is. I haven't seen many code examples (only short fragments on the page), so I don't know what potential issues, if any, there are. But, this is the sort of thing that could significantly complicate adding new language features that requires additional syntax.

edit: I'm perusing the source for the compiler, which is of course written in Zinc. This code from the main driver of the compiler perhaps gives a better feel for how it may look in practice:

  while i < argc
    def arg = argv[i]

    if is equal (arg, "-debug")
      debug = true

    elsif is equal (arg, "-v")
      version = true

    elsif is equal (arg, "-u")
      unicode = true

    elsif is equal (arg, "-o") && i < argc-1
      out filename = new string (bundle, argv[++i])
      to OS name (out filename)

    elsif is equal (arg, "-I") && i < argc-1
      append (include path, new string (bundle, argv[++i]))

    else
      filename = new string (bundle, arg)

    end

    ++i
  end

From an aesthetic point of view, it doesn't look that bad. In this example, I think "is equal", "out filename", "to OS name" and "include path" are all identifiers. But I'm still wondering what kind of parsing and lexing issues that may arise.

nene15y ago

I already have hard time parsing this code. The main problem I see, is that to read the code I have to know every single keywords in the language.

For example I was wondering if "new" is a keyword. If it is, then "new string ()" might be something interesting, otherwise it's just a function call.

Similarly this raises a question of whether I can write the following code:

  if end of line (str)

This might or might not be permitted because "end" is a keyword. If it is permitted, then the result looks pretty damn ambiguous to me. If it's not then I have to name my identifier differently, like so:

  if end_of_line (str)

But then I'm skrewing up the style of my code...

2 more replies

jhpriestley15y ago

Issues only arise if the language designer wants to use spaces for something else as well (like function application).

Raphael_Amiard15y ago

While i didn't panic, i find myself having quite a negative reaction to a language in which "Identifiers can have blanks" is listed under main features.

EDIT : Also, i see quite the opportunity from wrong parsing, not on the machine side, but on the human side. blanks already have a function in other programming languages : They are here to separate symbols. By giving them this double meaning, you actually bring context in the parsing of any piece of code, which i think could be a pretty painful exercise.

Other version : Don't design a language version because it makes code easier to type, if it doesn't also make it easier to read

(I know the author thinks it easier to read, but i'm not yet convinced about that)

Groxx15y ago

I don't see why parsing would be a problem; identifiers (and their pieces) always start with letters (so no "var 1"), alphanumerical, and cannot be a reserved word.

Meaning, parse word by word until you hit a key word or a significant character (,:". etc). You can't have "varb function(arg)" or its equivalent in any language I know, because it doesn't make sense - there's no operation on the varb, it's just "there". Similarly, "x y z = q r t" is unambiguous, because there's no stop to parsing either "x y z" or "q r t".

I think I'd like it. Hitting shift all the time, or reaching for "_" is a PITA and significantly slows my typing. It's especially annoying when you realize that identifiers with blanks could be leveraged into most languages with almost zero change to the parser, as long as it requires an end-of-statement terminator or ends on newlines.

1 more reply

jerf15y ago

That strikes me as giving lie to the "Ruby-like syntax" claim; ask a Ruby programmer what that line means and you will not get the correct answer for Zinc.

Actually the connection with Ruby is tenuous anyhow; Ruby and assembler just don't go together. An assembler should produce a very clear one-to-one correspondence of instruction to machine language opcode, pretty much by definition. A high-level language can turn a simple statement into arbitrarily-complicated run-time code, pretty much by definition. Neither of these are criticisms by any means, it's just what they are. There isn't much syntax cross-talk to be had there.

stcredzero15y ago

A high-level language can turn a simple statement into arbitrarily-complicated run-time code, pretty much by definition.

There are some high level languages where there is a pretty straightforward one-to-one correspondence of statement to bytecode(s).

There isn't much syntax cross-talk to be had there.

Explain the existence of Forth.

1 more reply

speleding15y ago

Although I agree that it isn't really possible to have a "Ruby-like" syntax for a low level language since a lot of the syntax depends on Ruby being dynamic, it still seems like valid ambition, as long as you know that limitation.

I would love to have a form of C / C++ with iterators and blocks and without all the curly braces and assorted cruft like 5 different ascii symbols being used in 20 different contexts (actually, Ruby does that too, when will language designers start using a few additional symbols to improve cognitive load?).

Hexstream15y ago

From the article:

"I did this because I hate uppercase characters in the middle of identifiers and I'm too lazy to type shift to get the '_'. In addition, I find it more readable."

just-use-lisp-style-identifiers-then

JonnieCache15y ago

>just-use-lisp-style-identifiers-then

hitting - is not significantly easier than hitting _ when compared to hitting the spacebar.

http://en.wikipedia.org/wiki/Fitts_law

2 more replies

VMG15y ago

but then you sacrifice the "-" (minus) infix operator

1 more reply

JonnieCache15y ago

As much as your criticisms may be valid, I think he has given sufficient justification:

"I did this because I hate uppercase characters in the middle of identifiers and I'm too lazy to type shift to get the '_'. In addition, I find it more readable."

This kind of "Because I said so" reasoning is valid in pretty much any hobbyist-type situation as far as I'm concerned. If you don't like it, fork it.

thesz15y ago· 2 in thread

Another Zinc: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.6...

That one is more popular, it eventually became OCaml.

mahmud15y ago

The Zinc paper is one of my all time implementation papers, right up there with Dybvig's thesis, Rabbit and Orbit papers on Scheme, Reppy's thesis on Concurrent ML, and SPJ's Tagless paper.

The Zinc Experiment is Leroy at his best; compiler hacking lore meets programming language research (no hand-waving past performance issues, with a critical eye towards foundations.)

sb15y ago

That was my first reaction, too :)

paperclip15y ago· 2 in thread

I will try to ignore the shallow (but horrifying) issue of identifiers including spaces.

The real question to be asked here is what is wrong with the current portable assembler (C) ? C has occupied this niche for a long time and quite successfully - I believe all current mainstream kernels are written in C (or possibly a limited subset of C++).

If you want a 'portable assembler', a modern C compiler is in my opinion, a good choice:

  - a solid specification: detailing the behaviour of operations, what is defined, implementation, or undefined behaviour.

  - access to platform specific features through builtins and intrinsics

  - ability to use inline asm if you really want to (or need to)

  - easy integration with existing libraries

  - minimal dependencies on a runtime library (pretty much none in freestanding implementations)

  - most compliers give have ways to get good control of both what code is generated and structure layout.

The modern C ecosystem provides (mostly good) tools for:

  - tracking memory leaks/invalid memory accesses (valgrind)

  - static analysis (clang static analyser, sparse, coverity, ...)

  - debuggers (gdb ...)

  - solid optimizing compilers (icc, gcc, llvm)

  - profilers (oprofile, perf, vtune, ...)

Admittedly, most of these tools don't depend on the code being written in C, but I suspect any new language would take a while to get properly integrated. If you want to use a low level language, you really want to have access to these tools or equivalent.

A new language trying to compete in this space would have to offer something fairly substantial to get me to switch - and a strange syntax like zinc is not going to help. From the documentation at least, zinc seems to currently be missing: an equivalent to volatile; asm; anyway to access a CAS like instruction; 64bit types; floats; a way to interface to C code; clear documentation about behaviour in corner cases (what happens if you a left shift a 32bit value by 40?). The only thing seems to bring to the table to compensate is the ability to inherit structures

haberman15y ago

I agree with you. I just wanted to list the one complaint I do have about C: missed optimization opportunities due to lax aliasing rules.

Consider the following C translation unit:

    void foo(const int *i);
    void bar();

    int baz() {
      int i = 1;
      foo(&i);
      return i + 1;
    }

    int quux() {
      int i;
      foo(&i);
      i = 1;
      bar();
      return i + 1;
    }

You'd like to think that both baz() and quux() could compile the return statements to a constant "return 2." After all, foo() is taking a pointer to a CONST int. But alas, this is not the case, because foo() could cast away the const. So in truth, both functions are forced to reload the integer from the stack, add 1 to it, and then return that! You can't use any values you had loaded in registers (or in this case, you can't evaluate the expression at compile time).

My example is contrived, but you can easily construct examples that fit the same pattern and are real.

I've heard that Fortran still beats C in optimization in some cases; I would expect that the above is one major reason why. C99's "restrict" addresses some of the difference but cannot help you with the above.

fanf215y ago

The main problems with C are inability to control memory layout in fine detail, and lack of control over the calling sequence - you can't portably get a tail call. Have a look at the C-- work by Simon Peyton Jones and Norman Ramsey and others for more details.

humbledrone15y ago· 1 in thread

I guarantee that I would confuse the types "byte" (uint8_t) and "octet" (int8_t). The typical distinction between a byte and an octet has to do with the number of bits in the representation (a byte usually has 8, an octet always has 8). I don't know of any convention for bytes being unsigned and octets being signed.

joubert15y ago

You're right that with "byte" there isn't an official size specification, although the de facto size is 8 bits, unlike with "octet", which was specifically defined as 8 bits (for interoperability between different systems).

Regarding the question of signed/unsigned - I'll try to explain:

byte - unsigned

On page 37 of the C99 standard: "A byte contains CHAR_BIT bits, and the values of type unsigned char range from 0 to 2^CHAR_BIT - 1)"

i.e. according to the C99 standard, a byte is unsigned.

octet - signed

Think of an octet in two ways: the concept of something that is exactly 8-bits on the one hand, and on the other hand, the technical representation of this concept.

When you read the literature you'll notice that an octet refers simply to the size of something (8 bits) and not is signedness. For example, octets arguably arose in the networking world, and the NDR (Network Data Representation) refers to octet in sign-neutral way.

On page 256 of the C99 standard: "The typedef name int N _t designates a signed integer type with width N, no padding bits, and a two’s-complement representation. Thus, int8_t denotes a signed integer type with a width of exactly 8 bits."

Now, how would you go about representing the concept of an "octet" (which is sign-neutral)? If you used an unsigned 8 bit integer, you can't represent the sign of the (conceptual) octet, while a signed 8 bit type can.

yawniek15y ago· 1 in thread

looks interesting but i cant get it to work on os x or linux.

sanxiyn15y ago

Works for me. Here is how to do minimal 3-stage bootstrap.

  gcc bootstrap/io.c bootstrap/zc.c -o zc1
  ./zc1 -I lib -I lib/platform/default -I src src/main.zc -o zc2.c
  gcc lib/libc/io.c zc2.c -o zc2
  ./zc2 -I lib -I lib/platform/default -I src src/main.zc -o zc3.c
  cmp zc2.c zc3.c # should be identical

wbhart15y ago

What is wrong with 64 bit integers? Maybe they've been indicted on war crimes or something. The number of languages that appear and don't support them.... And what about interfacing with C? I can count the languages on one hand that have a simple and efficient C interface! (I have a list of other things almost always ignored by languages for no good reason... efficiency, friendly license, lack of macros or ability to extend the language...)

wildmXranat15y ago

Interesting find. It seems to have been left to collect dust. Last changes are about 3 years ago.

timrobinson15y ago

This reminds me of a slightly higher-level High Level Assembly: http://en.wikipedia.org/wiki/High_Level_Assembly

Edit: the HLA web site always used to be a decent place to learn assembly language. I don't remember it being so mauve though: http://homepage.mac.com/randyhyde/webster.cs.ucr.edu/index.h...

gasull15y ago

Related: assembly programming with Python syntax.

http://www.corepy.org/

gcv15y ago

From the description: "The goal is to have a portable assembler." Why not just use Fortran?

j / k navigate · click thread line to collapse

50 comments

28 comments · 10 top-level

psnj15y ago· 12 in thread

I was surprised by my almost-panicky reaction to seeing:

  Identifiers Can Have Blanks

  open_window_with_attributes(...)
  becomes:
  open window with attributes (...)

I think I actually felt that wrongness in my stomach. Like a more intense version of seeing our corporate network shared drive's files with spaces and parens in them.

I guess I'm old.

scott_s15y ago

I had a similar reaction, and I'm not sure that it's a "damn kids, get off my lawn" reaction. Specifying an unambiguous grammar may be difficult - which implies parsing may become a problem.

edit: I'm perusing the source for the compiler, which is of course written in Zinc. This code from the main driver of the compiler perhaps gives a better feel for how it may look in practice:

  while i < argc
    def arg = argv[i]

    if is equal (arg, "-debug")
      debug = true

    elsif is equal (arg, "-v")
      version = true

    elsif is equal (arg, "-u")
      unicode = true

    elsif is equal (arg, "-o") && i < argc-1
      out filename = new string (bundle, argv[++i])
      to OS name (out filename)

    elsif is equal (arg, "-I") && i < argc-1
      append (include path, new string (bundle, argv[++i]))

    else
      filename = new string (bundle, arg)

    end

    ++i
  end

nene15y ago

I already have hard time parsing this code. The main problem I see, is that to read the code I have to know every single keywords in the language.

For example I was wondering if "new" is a keyword. If it is, then "new string ()" might be something interesting, otherwise it's just a function call.

Similarly this raises a question of whether I can write the following code:

  if end of line (str)

  if end_of_line (str)

But then I'm skrewing up the style of my code...

2 more replies

jhpriestley15y ago

Issues only arise if the language designer wants to use spaces for something else as well (like function application).

Raphael_Amiard15y ago

While i didn't panic, i find myself having quite a negative reaction to a language in which "Identifiers can have blanks" is listed under main features.

Other version : Don't design a language version because it makes code easier to type, if it doesn't also make it easier to read

(I know the author thinks it easier to read, but i'm not yet convinced about that)

Groxx15y ago

I don't see why parsing would be a problem; identifiers (and their pieces) always start with letters (so no "var 1"), alphanumerical, and cannot be a reserved word.

1 more reply

jerf15y ago

That strikes me as giving lie to the "Ruby-like syntax" claim; ask a Ruby programmer what that line means and you will not get the correct answer for Zinc.

stcredzero15y ago

A high-level language can turn a simple statement into arbitrarily-complicated run-time code, pretty much by definition.

There are some high level languages where there is a pretty straightforward one-to-one correspondence of statement to bytecode(s).

There isn't much syntax cross-talk to be had there.

Explain the existence of Forth.

1 more reply

speleding15y ago

Hexstream15y ago

From the article:

"I did this because I hate uppercase characters in the middle of identifiers and I'm too lazy to type shift to get the '_'. In addition, I find it more readable."

just-use-lisp-style-identifiers-then

JonnieCache15y ago

>just-use-lisp-style-identifiers-then

hitting - is not significantly easier than hitting _ when compared to hitting the spacebar.

http://en.wikipedia.org/wiki/Fitts_law

2 more replies

VMG15y ago

but then you sacrifice the "-" (minus) infix operator

1 more reply

JonnieCache15y ago

As much as your criticisms may be valid, I think he has given sufficient justification:

"I did this because I hate uppercase characters in the middle of identifiers and I'm too lazy to type shift to get the '_'. In addition, I find it more readable."

This kind of "Because I said so" reasoning is valid in pretty much any hobbyist-type situation as far as I'm concerned. If you don't like it, fork it.

thesz15y ago· 2 in thread

Another Zinc: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.6...

That one is more popular, it eventually became OCaml.

mahmud15y ago

The Zinc paper is one of my all time implementation papers, right up there with Dybvig's thesis, Rabbit and Orbit papers on Scheme, Reppy's thesis on Concurrent ML, and SPJ's Tagless paper.

The Zinc Experiment is Leroy at his best; compiler hacking lore meets programming language research (no hand-waving past performance issues, with a critical eye towards foundations.)

sb15y ago

That was my first reaction, too :)

paperclip15y ago· 2 in thread

I will try to ignore the shallow (but horrifying) issue of identifiers including spaces.

If you want a 'portable assembler', a modern C compiler is in my opinion, a good choice:

  - a solid specification: detailing the behaviour of operations, what is defined, implementation, or undefined behaviour.

  - access to platform specific features through builtins and intrinsics

  - ability to use inline asm if you really want to (or need to)

  - easy integration with existing libraries

  - minimal dependencies on a runtime library (pretty much none in freestanding implementations)

  - most compliers give have ways to get good control of both what code is generated and structure layout.

The modern C ecosystem provides (mostly good) tools for:

  - tracking memory leaks/invalid memory accesses (valgrind)

  - static analysis (clang static analyser, sparse, coverity, ...)

  - debuggers (gdb ...)

  - solid optimizing compilers (icc, gcc, llvm)

  - profilers (oprofile, perf, vtune, ...)

haberman15y ago

I agree with you. I just wanted to list the one complaint I do have about C: missed optimization opportunities due to lax aliasing rules.

Consider the following C translation unit:

    void foo(const int *i);
    void bar();

    int baz() {
      int i = 1;
      foo(&i);
      return i + 1;
    }

    int quux() {
      int i;
      foo(&i);
      i = 1;
      bar();
      return i + 1;
    }

My example is contrived, but you can easily construct examples that fit the same pattern and are real.

fanf215y ago

humbledrone15y ago· 1 in thread

joubert15y ago

Regarding the question of signed/unsigned - I'll try to explain:

byte - unsigned

On page 37 of the C99 standard: "A byte contains CHAR_BIT bits, and the values of type unsigned char range from 0 to 2^CHAR_BIT - 1)"

i.e. according to the C99 standard, a byte is unsigned.

octet - signed

Think of an octet in two ways: the concept of something that is exactly 8-bits on the one hand, and on the other hand, the technical representation of this concept.

yawniek15y ago· 1 in thread

looks interesting but i cant get it to work on os x or linux.

sanxiyn15y ago

Works for me. Here is how to do minimal 3-stage bootstrap.

  gcc bootstrap/io.c bootstrap/zc.c -o zc1
  ./zc1 -I lib -I lib/platform/default -I src src/main.zc -o zc2.c
  gcc lib/libc/io.c zc2.c -o zc2
  ./zc2 -I lib -I lib/platform/default -I src src/main.zc -o zc3.c
  cmp zc2.c zc3.c # should be identical

wbhart15y ago

wildmXranat15y ago

Interesting find. It seems to have been left to collect dust. Last changes are about 3 years ago.

timrobinson15y ago

This reminds me of a slightly higher-level High Level Assembly: http://en.wikipedia.org/wiki/High_Level_Assembly

Edit: the HLA web site always used to be a decent place to learn assembly language. I don't remember it being so mauve though: http://homepage.mac.com/randyhyde/webster.cs.ucr.edu/index.h...

gasull15y ago

Related: assembly programming with Python syntax.

http://www.corepy.org/

gcv15y ago

From the description: "The goal is to have a portable assembler." Why not just use Fortran?

j / k navigate · click thread line to collapse