Smaller Code, Better Code (opens in new tab)

(sacrideo.us)

137 pointsjpt49y ago157 comments

157 comments

72 comments · 16 top-level

natch9y ago· 17 in thread

From the project:

...

rth,←' A zs;A rs=scl(r.v(0));rr##mf(zs,rs,p);if(c==1){z.v=zs.v;R;}\',nl

  rth,←'  array v=array(z.s,zs.v.type());v(0)=zs.v(0);\',nl

  rth,←'  DO(c-1,rs.v=r.v(i+1);rr##mf(zs,rs,p);v(i+1)=zs.v(0))z.v=v;)\',nl

  rth,←' DL(zz,if(rr##scl){rr##df(z,l,r,p);R;}\',nl

...

No.

And commit messages like "Hopefully that does it." No again.

arcfide9y ago

Don't complain that Chinese is ugly and unreadable just because you speak English as your native tongue.

Technically, the above is a snippet of C++ put into an APL variable "rth" but there's so much more to it than that, and so much more to the design that you're missing.

The design and choice of aesthetic in the compiler is a very intentional one that is arguably one of the main issues that has caused me to rewrite the compiler so many times over the years and has lead to this massive code adjustment.

There are very good reasons that the compiler is written in the style that it is, and you cannot compare it to other project's style guides.

Keep in mind that this compiler is designed to run natively on the GPU in a fully data-parallel fashion.

One major issue that I had to address, and I discuss a little bit in a thread above, is the idea of the malleability of the code base. It's critically important to this project that I be able to adapt and alter the compiler rapidly. For example, I recently had to rewrite the entire backend due to a shift in some underlying core technology. This shift lead to a shrinkage of about 2000 lines of code because the underlying supporting libraries were a better fit to what I needed than what I was using previous to this. But I might not have been willing or able to make this change if I didn't have confidence that the rewrite would be swift and fast. Indeed, it took only two months to rewrite the backend from scratch, add more new features, improve robustness, and so forth. The code also got cleaner.

This obsessive need to be highly adaptable leads me to the desire to have exceptionally "disposable" code. The cost for replacing or deleting code should be as low as possible.

This has a few follow ups. In order to achieve the above, I need to ensure that I understand the ramifications of deleting code as readily as possible as quickly as possible. This basically means that I need to be able to squeeze as much of the compiler into my head as possible, and what doesn't fit, I need to be able to "see" and "read" as quickly and as readily as possible.

The compiler is designed so that I can see as much as possible with as little indirection as possible, so that when I see a piece of code I not only know how it works in complete detail, but how it connects to the world around it, and every single dependency related to it in basically one single half screen full of code (usually much less than that) without any jumps, paging, scrolling or any movement. It means that I can completely understand the ramifications of any edit I make in nearly complete detail without any dereferencing or indirection. There are one or two places where there are some helper utilities which are on a different page, but these are part of the "domain vocabulary" which is basically in my mental cache any time I'm working with that code. I keep these "helpers" to a minimum, so that they can fit with anything else I want and not waste mental space in my head. Too many helpers leads to a failure to understand the complete macro picture and thus defeats my ability to delete code.

In order to make the code more readable, it has to be highly consistent and idiomatic. I take this to an extreme level. This code is highly regular and predictable, to an almost obsessive degree. I do this by enforcing a style discipline on the code that allows me to eliminate the use of a host of abstractions, further paring down the complexity of the programming language in which I'm working and allowing me to think in the same mental plane at all times.

The idea of semantic density is critical to this point. The semantic density of the APL code I'm using to solve the problem is at a certain rate. I maintain a consistent density rate by choosing my variable names in such a way that they visually align with the expressivity per character of the built in primitive symbols. This means that the cadence when reading the code is maintained. The "universal" naming scheme allows me to take any given name and know exactly its purpose, parentage, place, and use in the compiler without adding any additional cognitive overhead of inheritance syntax, datatypes, classes, or anything more than a name.

The C++ code above is written the way it is to allow it to stylistically align with the semantic density of the APL code. This means that I can jump between the runtime and the compiler portions of the code with minimal mental shifts between the two, because the style and approach are similar. The code can be "read" in much the same way with minimal change. I am intentionally prioritizing internal semantic and stylistic consistency over satisfying the popular expectations of how C or APL code should look. I believe the internal consistency within the project contributes more strongly to the day-to-day readability and hackability of the project.

Furthermore, I strongly restrict my use of programming languages features. This simplifies self-hosting, but it is primarily a means of maintaining stylistic and cognitive power. Since I know how I need to think about my problem "compilation on the GPU" in order to make it go, I can restrict myself to a paradigm that only allows me to think in this way. I choose a paradigm that is also exceptionally expressive to allow me to be productive as well. By selecting the right core paradigm, I can eschew further programmatic abstractions since they contribute nothing and only cost something.

One way in which I do this is to write the core of the compiler with only one or two syntactical conventions, and only one main programming method: function composition. The entire core of the compiler is a single points-free (almost), data-flow, data parallel expression. Names provide the anchor points of the "macro" level ideas, but the language is expressive enough that I need very few other anchor points. Instead, I use only function composition over the core primitives with a syntax known as "trains" to create the mental effect of working with normal expressions when in reality I create new functions with every line in the core compiler (which is 90 lines or so). By restricting myself to only writing in this style, the mental effect works. If I had to switch between expression level and trains/points-free style in the code, it would be much less readable. But because I can now treat my points-free programs as regular expressions for all intents and purposes, it actually simplifies my cognitive load, as there is only one thing to think about: function composition.

arcfide9y ago

By keeping the code as visible (read, small) as possible, I see more code and can better reason at a macro level. To scale this down into the micro level of dealing with individual compiler passes, I replace all the traditional programming paradigms with others in a sort of 1 for 1 exchange. In this way, I develop a new set of idiomatic programming methods that are so concise, they can begin to be read as we read and chunk English phrases. By doing so, it becomes actually easier to just write out most algorithms, because the normal name for such an algorithm is basically as long as the algorithm itself written out. This means that I start to learn to chunk idioms as phrases and can read code directly, without the cost of name lookup indirection. I can get away with this because I've made reusability and abstraction less important (vastly so) because I can literally see every use case of every idiom on the screen at the same time. It literally would take more time to write the reusable abstraction than it would to just replace the idiomatic code in every place. It's a case of the disposability of code reaching a point that reusability is much less valuable.

This means that in those cases where reuse is valuable, it's very valuable, and it comes to the fore and you can see it as the critical thing that it is. It doesn't get drowned in otherwise petty abstractions that assist reusability, since we don't need that anymore.

Furthermore, if I write my code correctly, there is very, very little boiler plate in the compiler. Almost none. This means that every line is significant. By doing this it means that you don't get the fun of feeling like you're accomplishing something by typing in lots of excess boiler plate, but it does mean that you have no wasted architecture. Because rewriting the architecture is so trivial, basically everything now becomes important, and you don't have petty book keeping code around. You know that everything is important, and there is no superfluous bits.

The result, as mentioned elsewhere, is code that is getting continuously simpler, rather than continuously more complex. The code is getting easier to change over time, not harder. The architecture is getting simpler and more direct and easier to explain. Because it costs so little to re-engineer the compiler, I can do so constantly, resulting in little to no technical debt.

This is an intentional synergistic choice of a host of programming techniques, styles, disciplines, and design choices that enables me to program this way. Give up one of them and you start to break things down. It allows for a highly optimized programming code base that has all of the desirable properties people wish their code bases have, and it scares people. I think that's a good thing. Because I don't want people to see this codebase as just another thing. I want them to see that this is something truly different. How can I get away with no module system? How can I get away with no hierarchy? How can I get away with having everything at the top-level, with almost no nested definitions? How can I get away with writing a compiler that is not only shorter, but fundamentally simpler from a PL standpoint than standard compilers of similar complexity by using only function composition and name binding? How can I get a code base that has more features but continues to shrink?

By chasing smaller code. :-)

I assure you, and I'll make good on this in another reply here, I could get you up and running on understanding the code and how it works faster than just about any other compiler project out there. In the end, one of the goals I want for this compiler is for people to say, "Woah, wait, that's it? That's trivially simple." The more I can push people to think of my compiler as so trivial as to be obvious, the more I win. The compiler really is so dirt simple as to shock any normal compiler writer.

But to make it that simple, I have to do things in ways that people don't expect, because people expect complexity and indirection, they expect unnecessary layers for "safety" and they expect code that needs built in protections because the code is too complex to be obviously correct.

I'm pushing the other direction. If you can see your entire compiler at one go on a standard computer screen, what sort of possibilities does that open up? You can start thinking at the macro level, and simply avoid a whole host of problems because they are obviously wrong at that level. When you aren't afraid to delete you entire compiler and start from scratch? What sort of possibilities does that open up to you?

6 more replies

userbinator9y ago

Don't complain that Chinese is ugly and unreadable just because you speak English as your native tongue.

That's a great counterargument, and one I fully agree with. I've noticed that over the years, there has been a growing trend of promoting "readable, maintainable, clean, insert-fashionable-adjective-list-here code" which really amounts to a lower-common-denominator, dumbing-down perspective of how software should be written. In their perspective, code that someone does not immediately understand is "bad", seemingly regardless of how much (or little) knowledge that someone possesses. I think this is ultimately a harmful trend.

The opposing view, which appears to be largely a minority in more mainstream language communities but dominates in others like APL and Asm, is that programming languages are essentially like human languages: they need to be learned, are not necessarily "easy" or "familiar", and this learning and eventual mastery is wholly beneficial to their use. As with human languages, it is not expected nor a problem that a beginner will immediately understand code written by a more advanced user. Instead, the beginner progresses by learning the language and eventually becoming an advanced, "literate" user. This can be summed up in one sentence: "The code is unreadable because you are not yet qualified to read it." ;-)

1 more reply

lmm9y ago

I would be more sympathetic to this argument if the code was visibly a collaboration.

I am perfectly willing to believe that I could reduce the size of my code by a factor of 10, maybe even 100, if I was willing to give up the constraint of making it maintainable independently of myself. I think that would be a poor tradeoff to make in most cases.

1 more reply

Aardappel9y ago

I agree short/dense/simple/linear code has huge benefits that most programmers haven't experienced, simply because it is so hard to create (especially in some languages). Your code is both impressive and inspiring.

What additionally interests me is the combination of points-free style and the kind of data structures you're processing in an array-biased language, could you give an insight on what that is like to work with?

In particular, I presume from your description, and only a conceptual familiarity with APL, that most or all of this code is "functional", i.e. all data structures exist as values passed between the composed functions, and nowhere else (no globals or similar). I'd love to hear more about the predominant data structures and what shape they take.

Somewhere else you mention Quad-XML, which seems to be a way to represent trees as arrays, with each element pre-fixed with its depth. I presume you use this for the AST? What kinds of operations are simpler on these arrays, and which are harder, compared to tree data structures used in other languages? For example, addressing the Nth child from a parent could be harder, since you have to search past the other children? I could imagine that operations like "set all fields X of the tree to Y" are a lot easier since no tree traversal is required.

Does your ability to quickly refactor rely on this functional nature?

2 more replies

dmitriid9y ago

> Don't complain that Chinese is ugly and unreadable just because you speak English as your native tongue.

This is argument from analogy, and with a certainty nearing 100% it doesn't apply to programming languages.

If you want to take this argument all the way though... Why not use Japanese instead?

It's ugly, it's unreadable, it takes countless hours to master the language, the grammar, the writing system. In the end you arrive... at yet another language[1]. Which may or may not express some things that English can't. By the time you've mastered Japanese, you'll have achieved near perfection and all your goals in English :)

[1] I speak four and currently am in the process of learning a fifth language (Russian, Romanian, Turkish, English, Swedish). I can say with some "expertise" that you can't make direct comparisons between natural and computer languages.

1 more reply

dang9y ago

Snarky dismissals are not ok on Hacker News, especially not when they're advocating an entirely conventional and dare I say middlebrow position.

When faced with something unconventional, the reaction we're hoping for from HN users is first to pause—and then to reflect. If after pausing and reflecting you want to argue that the conventional position is right, you'll be able to do that thoughtfully and with some sense of nuance.

natch9y ago

Point taken.

foxhill9y ago

in defence of the parent's snarkiness, this code is disgusting.

imagine being presented with this and tasked with maintaining this. or adding a language feature. i'm certain the author could do it without much effort, but this code is as short as to be obfuscated - i have had more understanding from ioccc entries than this.

code exists as a common language for humans to understand and collaborate. this code is nightmare-ish.

2 more replies

nickpsecurity9y ago

I was tempted to agree after a glance at this thread but thought this looked like APL. Checking on the project confirmed it's an APL variant. APL is some weird-looking stuff I've never even tried to learn as I believed I didn't need it in place of languages + libraries that do similar jobs well with familiarity. The previous threads on HN about APL had similarly-weird code that the APL vets showing up thought was anywhere from fine to beautiful. This tells me we can't judge the quality of APL-like code unless we've dug into that paradigm and know what good, APL-like code looks like. Like other paradigms that are really different.

Are your an experienced user of array-oriented, programming languages? If so, what specific things about the code were bad other than the shortened names someone else mentioned?

arcfide9y ago

Did the above code snippet remind you of APL or was it something else? This is actually a bit of an important "research" oriented question to me and is actually relevant to the design of the Co-dfns compiler.

1 more reply

RodgerTheGreat9y ago

Are you going to articulate your objection to that code or just sneer at it unconstructively?

libeclipse9y ago

There's a lot of people jumping on the bandwagon of down-voting anyone who dares to criticise the code, but I'm going to give it a go anyway.

There's a reason why readable and beautiful code is favoured: it's so that anyone else that opens the source and tries to understand it doesn't have a difficult time, and therefore, anyone that tries to contribute doesn't have a difficult time either.

Looking at the project's Github page, I can see that there's no contributor even coming anywhere close to the project owner. Whether that's because of the obscurity of the codebase or for another reason, I can't comment. However, it does stand that the project owner is the only real contributor, and so the minimum that he himself has to consider is if he can understand the code.

Having said that, looking at the code does make me cringe. I'm sorry if that offends anyone but it is what it is: the code is not very nice to look at. It seems as though it has been engineered to be as obfuscated and shrunken as possible, without any regard for readability. I mean just the file names themselves: was there really any need for single-letters?

Now the author claims that (and a lot of other people agree with him on this) it is not for the purpose of what I outlined above, but rather, as mentioned before, so that he can understand it all easily and rapidly modify it. Whether or not that's the case I do invite you to consider the fact that the post that we are all replying to is somewhat bragging about the extremely small size of the codebase.

Personally, I think this sort of code would fit in rather well on a code-golfing forum or something similar, not on a production system. Then again, it is a personal project so ¯\_(ツ)_/¯

arcfide9y ago

Don't worry about offending me with that comment. I have a pretty strong belief in why I'm coding this way, so I'm glad to have the opportunity to work with people like you who find the code scary and disgusting and see if I can't either change your mind or change the code to be better.

Firstly, about this being a personal project, it's actually a bit more than that. It serves as a research platform for a research agenda around the usability of programming languages and the HCI and pedagogy of computation, yes. However, it's also a commercially funded compiler that is commercially licensed and distributed. The compiler is still in early stages, so it's large a boutique offering at this point, but that's scheduled to change this year or maybe the next. And yes, the development team that works with me on this compiler has read the code, and while they are not as fluent in it as I am, they understand how to work with it and we can talk about the compiler and work through issues in the compiler that comes up.

Indeed, the fact that the compiler is so easy to track through at a macro level has allowed us to avoid needing extra documentation throughout, because when we have a question about some level of architecture design, we can usually pull up a page of the compiler and work through it without needing any other documentation.

And, the point of the post above was that small code is a useful metric for pushing for simplicity. There is a difference between obfuscation and small code, but my code is not obfuscated to those who need to work with it. It is obfuscatory for anyone who expects to read it like a normal program.

At the heart of this is the meaning of readability. You've implicitly defined readability as being a state for code bases that allows anyone else to read and understand the code. That's a high bar. If I write a standard proof of the uncountability of the real numbers, it's a rather high bar to say that everyone should be able to read that proof.

Also, if you look at the way that the Clang codebase is engineered, for instance, if you take any one snippet of 50 lines or so of code, it's all nice, neat, and readable. But when it comes to understanding the entire compiler as a whole, the codebase is completely unreadable. It requires external documentation to understand almost any part of that code at a macro level.

But Clang uses best industry practices and is, on the whole, what most people would consider very cleanly written code. And yet, it is essentially impenetrable from a macro level without other documentation.

Instead, I'd submit that readability is something that we should consider valuable for those who have the relevant pre-requisite understandings of key ideas and concepts when looking at a new code base.

Part of the problem is that this code base is introducing new, research level ideas into the coding space. There is a fundamental difference between it and other compilers in the approach that it is taking, and thus, you can't just look for the same patterns.

I've already touched on malleability elsewhere. SICP has a classic quote about the importance of malleability in code (amoeba vs. pyramid programming).

I'll give a simple description of the architecture. If you understand everything said in the following sentence, then you'll have no trouble understanding how the code is written, and if you don't, then learning these sub-domains of programming skillsets will go a long way in helping to clarify the design.

It's a three part dfns->C++ offline batch compiler overloading the standard Quad-Fix system function in APL interpreters to compile whole, closed namespace scripts built of pure functional dfns on the Dyalog 15.0 primitive vocabulary sans-guards through a PEG parser to a core compiler written in a Nanopass compiler architecture over a linearized Quad-XML style matrix AST representation where each pass is written as a data-flow, data-parallel function train leading to a single dispatch code generator with a runtime library header prepended to each output file containing implementations of each implemented APL primitive.

Those would be the basic set of techniques and skills that are being put to use in the compiler. If you already understand Nanopass, PEG parsers, the Quad-XML tree linearization format, function trains, and so forth, then the structure and format and design of the compiler is obvious and easy to work with after about 5 minutes of orientation. If you don't have that background, then understanding that part of the compiler is rather a difficult one. In addition to this, there are new techniques being used and applied to solve problems in this compiler itself, and those are being documented through the papers that I'm publishing on these techniques:

https://github.com/arcfide/Co-dfns#publications

Most people don't have a strong data-parallel, array style programming background, which makes the micro-level code the hardest part to understand for them. However, if you are experienced in that background, then working with the compiler passes is not difficult, provided that you take the time to understand the core idioms in play.

So, in summary, I'd say that you're right that the code looks horrendous, because your heuristics are designed for code that is completely, almost assuredly, fundamentally different than this code. However, like I said, come to the live session and see me explicate the architecture of the compiler. I'll explain a lot of the ideas I mention in the above sentence enough to allow you to walk through the code easily. If you still think it's scary, okay. I'd appreciate some ways to make it easier to work with it on a day to day basis.

1 more reply

arcfide9y ago

Sorry! I just realized that I forgot to answer the question about file names. The filenames themselves are a bit of a cultural homage to historical APL development. They are a little bit of a part of my push to stay small, because if I go beyond 26 or so files, I'm in trouble. But it's also a little bit of a "self documenting" element. There's a famous example of the style of C coding that I'm doing here from the author Arthur Whitney, the K developer. He famously whipped up a little J interpreter prototype that was about a page of code and Kenneth Iverson spent some time studying that code to understand its structure and layout and found it interesting. Whitney famously tended to write software in a very ascetical style and just used single letter names for his files.

The use of single letter names in the files here is a bit of an inside joke, referencing back the style of programming of Arthur Whitney, signaling a bit of a historical "stylistic" or artistic connection, while at the same time being the first "alert" to the programmer that they are likely to see something along the lines of Whitney style C code inside of the files. It serves both as a chuckle to the APL community as well as a documentation of how you might want to prepare your mind before reading the code.

1 more reply

arcfide9y ago

Oh, and on another note, I've found that it's mostly programmers and computer scientists who struggle the most with the code. I've tried this style of programming out with high school students with little to no programming background, and they were able to pick it up and use it to do more in 12 hours than most students in an entry-level undergraduate course did in the first half of their semester.

dang9y ago

> There's a lot of people jumping on the bandwagon of down-voting anyone who dares to criticise the code

Please omit such offensive/defensive rhetoric from your posts to HN. It adds no information and is bad for conversation.

The problem here isn't "daring" to criticize, it's rejecting the unfamiliar. This is like traveling to a new country and complaining because they cook everything wrong and say everything wrong. Unfamiliarity is relative—it's not a property of the thing you're reacting to. Same with readability: it's relative to the reader.

In some contexts this is obvious. If you don't know German, you wouldn't reject a German text as unreadable or poorly written. But in other contexts, when we unconsciously assume or were taught that there's only one valid way to do something, we react with shock and distaste at work that violates known conventions. Such work may in fact be organized around different conventions for reasons we don't yet see. Good conversation across such boundaries requires a bit of distance from our own assumptions.

Programming is like the world of art this way. There are countless examples in art history of sharp departures from convention provoking shock and distaste, and people saying things like "There's a reason why readable and beautiful [art] is favoured". Riot police famously had to be called to the early shows of the Impressionists, yet the beauty of their paintings is obvious to us now.

2 more replies

arcfide9y ago· 15 in thread

As the author of this code in question, I'd like to make the offer to the Hacker News community and anyone at large. I'll do a live screen cast demonstration for interested persons and walk you through the entire compiler in 30 minutes to 1 hour. In the end you won't have a complete understanding of the compiler, but if you have reasonable prior programming experience, I claim that you will have a better, more full, and complete understanding of the compiler than if you had spent the same amount of time learning most other compiler designs. At that point, you would be able to continue your own self-study and would be able to start making contributions to the compiler rather quickly. This is an offer to demystify the code to people so that they have an opportunity to see how it really does make the whole compiler simpler and easier to work with.

If people express interest, I'll run such a live session and let people judge for themselves what they think of the code and my approach to "simplicity" after they've been introduced personally to the code base.

dang9y ago

That's a great idea. If you'd be interested in doing this semi-officially on HN (maybe something along the lines of an AMA) please email hn@ycombinator.com and let's co-ordinate it!

arcfide9y ago

Done.

jpt4OP9y ago

I would observe such a live session.

usr3819y ago

Definitely interested. I've dabbled in k for small problems and would like to understand more about the array approach to larger programs.

arcfide9y ago

The official live stream is up now here:

https://news.ycombinator.com/item?id=13638086

arcfide9y ago

Update. I'll be giving this live session on Monday the 13th at 3 PM EST. Stay tuned for a post on HN near that time for the link to the live session.

credit_guy9y ago

Maybe it's a bit late to ask, but is the live session still planned to happen? I'd personally want to dial in, but I don't know the details.

1 more reply

RodgerTheGreat9y ago

Sounds great! If you do so, please be sure to record it for posterity.

whiskers9y ago

I'd also be really interested in this - it sounds intriguing!

chetanbhasin9y ago

I'd be down for such a session. Sounds like a great idea!

camelspade9y ago

I would like to see this as well, sounds very interesting

vnz9y ago

I am interested too.

jaccarmac9y ago

Please do!

kutkloon79y ago

I'm not sure if everyone who's interested knows APL... I was initially interested, but honestly a bit less when I saw the compiler is in APL. Still a very impressive feat, just not my cup of tea.

arcfide9y ago

Understandable. However, it won't be necessary to know APL to understand what I'm going to talk about. I might delve into a bit of APL, but only from the perspective of how to work with the code. My hope is that the more general lessons can be taken and used anywhere, and improve the code of projects elsewhere.

burgerdev9y ago· 8 in thread

At first I was wondering how he managed to write a compiler in 750 loc. Then I noticed it's for APL, which I would call terse:

  Y0←{⊃,/((⍳≢⊃n⍵)((⊣sts¨(⊃l),¨∘⊃s),'}',nl,⊣ste¨(⊃n)var¨∘⊃r)⍵),'}',nl}

Silhouette9y ago

Does anyone here program APL? I've tried to look into it occasionally because the idea of powerful, concise syntax appeals to me, but the unfamiliar syntax was always too much to get my head around within a reasonable amount of time. I'm curious to know whether it really does become second nature after a while, in the same way that some of us might read a printf format string or regular expression quite fluently after many years of working with them.

RodgerTheGreat9y ago

I program in K, a close relative, and I have done some tinkering with APL. The symbols actually don't take long to memorize- perhaps a few days of practice. It's a bit like learning to read prose. At first you have to sound out words letter by letter, but eventually you're able to "see" words and phrases built out of common patterns of symbols. I see ,/f' and think flatmap, ~~': and think heads of uniform runs, {x@<x} and think sort up, etc.

A dense expression can still take a while to puzzle out sometimes, but certainly no longer than the equivalent logic spelled out in a more verbose language across many lines.

2 more replies

kbob9y ago

I programmed in APL circa 1980-83. It's not a normal programming language in that you don't use it to solve the same kinds of problems as you'd solve in Pascal or Fortran (then) or Python or C (now). Lisp and Smalltalk are a little like that. If you pick a problem they're good at, they're very good indeed, but if you're writing something they don't do, they're just Pascal/Python with funny syntax and nonstandard libraries.

APL really encourages code golfing. It's barely readable in the best of circumstances, so there's a lot of temptation to compress code. Er, it's readable, but it's like reading a regular expression. You read it character by character, not line by line.

The other thing APL encourages is writing dimensionality independent code. Just as a good C programmer will write a function to concatenate two strings that works independent of the strings' sizes, a good APL programmer will write a function to accept arrays of arbitrary dimensionality where that can make sense. That's because it was not uncommon to have intermediate results of 3, 4 or 5 dimensions. Also, most of the built-in operators do something useful with higher-dimensioned inputs.

So yes, it becomes second nature, more or less. But at the same time, it restricts the kinds of problems you think about solving with it.

Today I would not recommend APL for any purpose except studying its place in computing history.

1 more reply

arcfide9y ago

Actually, most of those 750 LoC is C++ code for the runtime, written in a style to match the semantic density of the rest of the code, as well as code for calling off to the various C compilers on various operating systems. The core compiler (between the parser and final code generator pass) is around 90 lines and is written is a particular style of APL code.

zzzcpan9y ago

He also replaces long names with short ones, so it's more like an obsession. First commit I clicked on was replacing "penv" with "p" just to make it shorter.

arcfide9y ago

There's a specific reason I made that switch, which for a long time had appeared to be a silly change. Eventually I realized that "penv" as a name was so different from the rest of the naming conventions that it was causing cognitive dissonance in my programming that was taking me out of the flow and making it more difficult to work with the code. Move to the name "p" did shorten the code, but more importantly, brought more consistency, predictability, and regularity into the code base. It is a case of synergizing simplicity and brevity and how they work together.

1 more reply

coldtea9y ago

That wouldn't affect line count.

2 more replies

userbinator9y ago

For something in a more... conventional language, here's a compiler/interpreter for a C subset in ~500 lines:

https://news.ycombinator.com/item?id=8558822

rakoo9y ago· 5 in thread

Reminds me of that good ol' folklore: http://www.folklore.org/StoryView.py?story=Negative_2000_Lin...

ScottBurson9y ago

That's a great story. I can't resist quoting Dijkstra: If we wish to count lines of code, we should not regard them as "lines produced", but as "lines spent" -- the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.

arcfide9y ago

I am very inspired by Dijkstra's high level ideas on programming. Importantly, one of the fundamental assumptions of Dijkstra was that you could actually understand your code base and reason about it. The creation of excessive abstraction may create a degree of robustness that protects against programmer's who don't understand the code base, under the assumption that no one will, but at the cost of eventually ensuring that no one will be able to understand the entirety of the code base or even reason at the macro and micro levels efficiently at the same time for a large part of the code base.

nickpsecurity9y ago

"the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger."

Never seen that. Thanks for sharing as it might be a great way to drive point home to business types. The amount of code certainly turns [for developers] from asset into a liability as it's size and usage grows. Hmm. Maybe such a presentation could always consider the amount of code a liability or neutral asset that produced benefits on the other side or reduced them. A well-stated connection between the two might justify reducing technical debt.

ythn9y ago

The kind of managers who use lines of code as a performance metric are the kinds of managers I avoid like the plague. It usually indicates that they don't understand what I am working on and are likely to reward code firefighters more than code surgeons.

EGreg9y ago

What do you think is a good measure of productivity?

In my opinion it is the number of requested features shipped minus the number of bugs introduced. Weighted by the importance of each, as collectively decided on by everyone or client.

4 more replies

finin9y ago· 5 in thread

I've found the when teaching, I sometimes work on an example program too much, producing what I think is elegant and compact code, but that the students find hard to understand. I suspect that the same may be true when I am collaborating with others on a program. There can be value in writing code in a straightforward, easy to comprehend style.

delinka9y ago

"Compact code" is not orthogonal to "less code." It is said that you don't truly understand the problem you're trying to solve until you've implemented it a few (3?) times. Once you begin to understand the problem, you can often find places that required no code: either an existing API solved that problem and you weren't aware; or perhaps you found a more 'pure' solution and you can remove much of your code. This does not mean that you need to write compact, unintelligible code.

Also consider: "In anything at all, perfection is finally attained not when there is no longer anything to add, but when there is no longer anything to take away, when a body has been stripped down to its nakedness."

Lately in my programming career, I find myself simplifying code, distilling it to solve the problem at hand, then clarifying the code (with good variable names, explicit comparisons to NULL/nil, fully demarcated if/else, small well-named functions, etc) so that future me can grasp it faster. This has the added benefit of pleasant peer review and getting new devs acquainted with the code.

chrisweekly9y ago

Nice post. Related note, I just recently found this gem of a Hickey talk about "Simple" vs "Easy": https://github.com/matthiasn/talk-transcripts/blob/master/Hi...

abecedarius9y ago

The clearest case of this I've seen was Norvig's Udacity CS212 class: in the third week there were despairing and even angry posts to the forum about his code to generate strings from regular expressions. (Here's a version of it, with small changes plus a regex parser, since Udacity started requiring login: https://github.com/darius/parson/blob/master/eg_regex.py)

But some people have also told me that was a great learning experience -- they spent many hours understanding this half page of code, and felt they'd really grown once they'd mastered it.

Silhouette9y ago

My experience has been similar. A good coding style for teaching, when the reader doesn't yet recognise the building blocks of a language or their idiomatic usage, is often very different to a good coding style for professional use, when the reader can be assumed to understand the concepts and idioms already.

akkartik9y ago

This affects more than just beginning programmers. Many best practices we teach programmers today help insiders manage a project but hinder understanding in newcomers to the project (even if they're familiar with the language, libraries, etc.). In a strange new project straight-line code is usually easier to follow than lots of indirection and abstractions. Comments are of limited value because most comments explain local features, but fail to put them in a global context. Build systems that automate a lot of work in our specialized industrial-strength setup turn out to be brittle on someone's laptop when running for the first time.

1 more reply

edblarney9y ago· 3 in thread

Smaller is better, but that does not mean 'fancy pants super dense cryptic code'.

I think 'simpler' would be a better term than 'smaller'.

Also - every line of code has cost. A lot of cost. Maintenance of code and complexity is not only expensive, but it adds to the maintenance of other code.

So less code to solve the problem is almost always better.

arcfide9y ago

At the heart you are absolutely right. We're after simplicity and clarity. However, I have found that "small" really does make a difference, especially if you push yourself to be small on the macro, rather than micro level. If I just chose "simple," it is too easy to believe that it's "simple enough." If I force myself to maintain poetry like small-ness, then I'm not just able to get by with "simple enough" but have to seek macro levels of simplification that we can often fail to see when the code is so large that all we look at is the single, local view of a single function.

By forcing myself to ever greater degrees of ascetical code sizes, small, cute micro hacks in a given function don't work. At that point the "fancy pants" hacks fail, and I am forced to create macro simplifications that obviate the need for whole classes of programming techniques.

So, yes, we want simple, but it's about how we can push ourselves and our minds to get there.

edblarney9y ago

Yup, I agree on the 'smaller architecture' bit.

One more point: I find that there are a lot of very common things that we, as developers, have not 'standardized' on - but if we did, it would be beneficial.

The underscore/lodash JS libraries are great examples of this.

They are not just a bunch of 'helper functions' - they are really a series of new 'functional keywords' that in a way represent a new paradigm in software: we all get used to these 'mini patterns' and call them the same thing, and when used in code they can make things a lost simpler.

Map, reduce, find, each, pull, filter etc. etc. - at first glance it would seem compulsive to jam all these into some code - but once the developers are familiar with them ... guess what - they become almost part of the programming language itself.

So I think this is a pretty good example of a 'meta' way to facilitate simplicity: agree on names for very common patterns, and abstract them away with tools or linguistic constructs.

1 more reply

MaysonL9y ago

This reminds me of Alan Kay and company's attempt at VPRI to reduce a full PC OS to 20K loc, which produced some very sweet and concise code.

dude019y ago· 2 in thread

Woah! From the article: "added roughly 4,062,847 lines of code to the code base, and deleted roughly 3,753,677".

zzzcpan9y ago

This is not a good thing though, meaning the language and abstractions are not expressive and not reusable enough. Self-hosting compilers, like the author's, feel wrong to me because of that, meta DSLs for compilers should serve as much better abstractions and save a lot of work.

arcfide9y ago

Except that your meta DSL probably isn't able to solve the problem that this compiler is solving, which is putting an entire compiler natively onto the GPU in a way that the code is actually maintainable in a "native GPU" version, rather than requiring translation from some other state.

This compiler has gone through many core paradigm shifts in an attempt to find an appropriate way to express a solution to the problems that it encountered. Each iteration revealed some new insight into how to solve the problem, but inevitably lead to a need to rethink the system.

Now, the system is so expressive and capable that reusability isn't even an issue. At this point reusability is about as useful in the compiler as having a new word to represent the word "the". Why? Why not just write the? Anything else you could write is likely to create confounding layers of indirection and distance between definition and use in the code that will actually obscure clarity.

Instead, I take the intentional approach to make the code as "disposable" as possible. Why change a compiler pass that is two lines long when you can just rewrite it from scratch in less time? By leveraging a different aesthetic, architecture, and language, I'm able to have more expressivity by removing unnecessary abstraction and making it as easy as possible to re-engineer the whole thing at the drop of a hat. This means that I never have to "live with" code bloat or some design decision that's annoying me. The cost to re-engineer is so low that I have almost no technical debt. If an architecture fails to scale, replace it and move on, without any loss of productivity, and a net gain since the code gets easier and easier to work with on each iteration.

jfoutz9y ago· 1 in thread

As pointed out in paip, clarity and concision are at odds. It takes good taste to balance the two.

BurningFrog9y ago

I think it also takes empathy. In the sense that you can imagine how the code would read to someone else, who was new to it.

franciscop9y ago

The numbers are nothing like this, but I had a really similar experience to the author when doing Umbrella JS. With exceptions, but I've tried to keep every function down to few lines of code by doing heavy code reuse:

    // src/addclass/addclass.js
    // Add class(es) to the matched nodes
    u.prototype.addClass = function () {
      return this.eacharg(arguments, function (el, name) {
        el.classList.add(name);
      });
    };

While they don't do exactly the same (Umbrella JS is more flexible but jQuery supports IE9), compare that to jQuery's addClass():

    addClass: function( value ) {
    	var classes, elem, cur, curValue, clazz, j, finalValue,
    		i = 0;

    	if ( jQuery.isFunction( value ) ) {
    		return this.each( function( j ) {
    			jQuery( this ).addClass( value.call( this, j, getClass( this ) ) );
    		} );
    	}

    	if ( typeof value === "string" && value ) {
    		classes = value.match( rnothtmlwhite ) || [];

    		while ( ( elem = this[ i++ ] ) ) {
    			curValue = getClass( elem );
    			cur = elem.nodeType === 1 && ( " " + stripAndCollapse( curValue ) + " " );

    			if ( cur ) {
    				j = 0;
    				while ( ( clazz = classes[ j++ ] ) ) {
    					if ( cur.indexOf( " " + clazz + " " ) < 0 ) {
    						cur += clazz + " ";
    					}
    				}

    				// Only assign if different to avoid unneeded rendering.
    				finalValue = stripAndCollapse( cur );
    				if ( curValue !== finalValue ) {
    					elem.setAttribute( "class", finalValue );
    				}
    			}
    		}
    	}

    	return this;
    },

jcoffland9y ago

It's interesting to note that the author has written more lines here in this thread than are contained in the compiler in question. The English language is not nearly as concise as APL.

skybrian9y ago

It seems like there is a missing explanation of the language this compiler compiles and why someone would want to use it? (Searches on "dfns" and "co-dfns" don't find much.)

fourier9y ago

Here is the link: https://www.youtube.com/watch?v=gcUWTa16Jc0 and proper q/a thread: https://news.ycombinator.com/item?id=13638086

nattaylor9y ago

>for every one of those 750 lines, I've had to examine, rework, and reject around 5400 lines of code.

I guess there's no such thing as "good enough" with a compiler?

Those are staggering numbers to me. Kudos to the author.

arcfide9y ago

The live session is up and running now. You can find more information about the stream and ask your questions at the following post:

https://news.ycombinator.com/item?id=13638086

known9y ago

AKA https://en.wikipedia.org/wiki/Pareto_principle

n0mad019y ago

thats roughly 1369 loc added per commit or 1855 loc per day.

j / k navigate · click thread line to collapse

157 comments

72 comments · 16 top-level

natch9y ago· 17 in thread

From the project:

...

rth,←' A zs;A rs=scl(r.v(0));rr##mf(zs,rs,p);if(c==1){z.v=zs.v;R;}\',nl

  rth,←'  array v=array(z.s,zs.v.type());v(0)=zs.v(0);\',nl

  rth,←'  DO(c-1,rs.v=r.v(i+1);rr##mf(zs,rs,p);v(i+1)=zs.v(0))z.v=v;)\',nl

  rth,←' DL(zz,if(rr##scl){rr##df(z,l,r,p);R;}\',nl

...

No.

And commit messages like "Hopefully that does it." No again.

arcfide9y ago

Don't complain that Chinese is ugly and unreadable just because you speak English as your native tongue.

Technically, the above is a snippet of C++ put into an APL variable "rth" but there's so much more to it than that, and so much more to the design that you're missing.

There are very good reasons that the compiler is written in the style that it is, and you cannot compare it to other project's style guides.

Keep in mind that this compiler is designed to run natively on the GPU in a fully data-parallel fashion.

This obsessive need to be highly adaptable leads me to the desire to have exceptionally "disposable" code. The cost for replacing or deleting code should be as low as possible.

arcfide9y ago

By chasing smaller code. :-)

6 more replies

userbinator9y ago

Don't complain that Chinese is ugly and unreadable just because you speak English as your native tongue.

1 more reply

lmm9y ago

I would be more sympathetic to this argument if the code was visibly a collaboration.

1 more reply

Aardappel9y ago

Does your ability to quickly refactor rely on this functional nature?

2 more replies

dmitriid9y ago

> Don't complain that Chinese is ugly and unreadable just because you speak English as your native tongue.

This is argument from analogy, and with a certainty nearing 100% it doesn't apply to programming languages.

If you want to take this argument all the way though... Why not use Japanese instead?

1 more reply

dang9y ago

Snarky dismissals are not ok on Hacker News, especially not when they're advocating an entirely conventional and dare I say middlebrow position.

natch9y ago

Point taken.

foxhill9y ago

in defence of the parent's snarkiness, this code is disgusting.

code exists as a common language for humans to understand and collaborate. this code is nightmare-ish.

2 more replies

nickpsecurity9y ago

Are your an experienced user of array-oriented, programming languages? If so, what specific things about the code were bad other than the shortened names someone else mentioned?

arcfide9y ago

1 more reply

RodgerTheGreat9y ago

Are you going to articulate your objection to that code or just sneer at it unconstructively?

libeclipse9y ago

There's a lot of people jumping on the bandwagon of down-voting anyone who dares to criticise the code, but I'm going to give it a go anyway.

Personally, I think this sort of code would fit in rather well on a code-golfing forum or something similar, not on a production system. Then again, it is a personal project so ¯\_(ツ)_/¯

arcfide9y ago

I've already touched on malleability elsewhere. SICP has a classic quote about the importance of malleability in code (amoeba vs. pyramid programming).

https://github.com/arcfide/Co-dfns#publications

1 more reply

arcfide9y ago

1 more reply

arcfide9y ago

dang9y ago

> There's a lot of people jumping on the bandwagon of down-voting anyone who dares to criticise the code

Please omit such offensive/defensive rhetoric from your posts to HN. It adds no information and is bad for conversation.

2 more replies

arcfide9y ago· 15 in thread

dang9y ago

That's a great idea. If you'd be interested in doing this semi-officially on HN (maybe something along the lines of an AMA) please email hn@ycombinator.com and let's co-ordinate it!

arcfide9y ago

Done.

jpt4OP9y ago

I would observe such a live session.

usr3819y ago

Definitely interested. I've dabbled in k for small problems and would like to understand more about the array approach to larger programs.

arcfide9y ago

The official live stream is up now here:

https://news.ycombinator.com/item?id=13638086

arcfide9y ago

Update. I'll be giving this live session on Monday the 13th at 3 PM EST. Stay tuned for a post on HN near that time for the link to the live session.

credit_guy9y ago

Maybe it's a bit late to ask, but is the live session still planned to happen? I'd personally want to dial in, but I don't know the details.

1 more reply

RodgerTheGreat9y ago

Sounds great! If you do so, please be sure to record it for posterity.

whiskers9y ago

I'd also be really interested in this - it sounds intriguing!

chetanbhasin9y ago

I'd be down for such a session. Sounds like a great idea!

camelspade9y ago

I would like to see this as well, sounds very interesting

vnz9y ago

I am interested too.

jaccarmac9y ago

Please do!

kutkloon79y ago

I'm not sure if everyone who's interested knows APL... I was initially interested, but honestly a bit less when I saw the compiler is in APL. Still a very impressive feat, just not my cup of tea.

arcfide9y ago

burgerdev9y ago· 8 in thread

At first I was wondering how he managed to write a compiler in 750 loc. Then I noticed it's for APL, which I would call terse:

  Y0←{⊃,/((⍳≢⊃n⍵)((⊣sts¨(⊃l),¨∘⊃s),'}',nl,⊣ste¨(⊃n)var¨∘⊃r)⍵),'}',nl}

Silhouette9y ago

RodgerTheGreat9y ago

A dense expression can still take a while to puzzle out sometimes, but certainly no longer than the equivalent logic spelled out in a more verbose language across many lines.

2 more replies

kbob9y ago

So yes, it becomes second nature, more or less. But at the same time, it restricts the kinds of problems you think about solving with it.

Today I would not recommend APL for any purpose except studying its place in computing history.

1 more reply

arcfide9y ago

zzzcpan9y ago

He also replaces long names with short ones, so it's more like an obsession. First commit I clicked on was replacing "penv" with "p" just to make it shorter.

arcfide9y ago

1 more reply

coldtea9y ago

That wouldn't affect line count.

2 more replies

userbinator9y ago

For something in a more... conventional language, here's a compiler/interpreter for a C subset in ~500 lines:

https://news.ycombinator.com/item?id=8558822

rakoo9y ago· 5 in thread

Reminds me of that good ol' folklore: http://www.folklore.org/StoryView.py?story=Negative_2000_Lin...

ScottBurson9y ago

arcfide9y ago

nickpsecurity9y ago

"the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger."

ythn9y ago

EGreg9y ago

What do you think is a good measure of productivity?

In my opinion it is the number of requested features shipped minus the number of bugs introduced. Weighted by the importance of each, as collectively decided on by everyone or client.

4 more replies

finin9y ago· 5 in thread

delinka9y ago

chrisweekly9y ago

Nice post. Related note, I just recently found this gem of a Hickey talk about "Simple" vs "Easy": https://github.com/matthiasn/talk-transcripts/blob/master/Hi...

abecedarius9y ago

But some people have also told me that was a great learning experience -- they spent many hours understanding this half page of code, and felt they'd really grown once they'd mastered it.

Silhouette9y ago

akkartik9y ago

1 more reply

edblarney9y ago· 3 in thread

Smaller is better, but that does not mean 'fancy pants super dense cryptic code'.

I think 'simpler' would be a better term than 'smaller'.

Also - every line of code has cost. A lot of cost. Maintenance of code and complexity is not only expensive, but it adds to the maintenance of other code.

So less code to solve the problem is almost always better.

arcfide9y ago

So, yes, we want simple, but it's about how we can push ourselves and our minds to get there.

edblarney9y ago

Yup, I agree on the 'smaller architecture' bit.

One more point: I find that there are a lot of very common things that we, as developers, have not 'standardized' on - but if we did, it would be beneficial.

The underscore/lodash JS libraries are great examples of this.

So I think this is a pretty good example of a 'meta' way to facilitate simplicity: agree on names for very common patterns, and abstract them away with tools or linguistic constructs.

1 more reply

MaysonL9y ago

This reminds me of Alan Kay and company's attempt at VPRI to reduce a full PC OS to 20K loc, which produced some very sweet and concise code.

dude019y ago· 2 in thread

Woah! From the article: "added roughly 4,062,847 lines of code to the code base, and deleted roughly 3,753,677".

zzzcpan9y ago

arcfide9y ago

jfoutz9y ago· 1 in thread

As pointed out in paip, clarity and concision are at odds. It takes good taste to balance the two.

BurningFrog9y ago

I think it also takes empathy. In the sense that you can imagine how the code would read to someone else, who was new to it.

franciscop9y ago

    // src/addclass/addclass.js
    // Add class(es) to the matched nodes
    u.prototype.addClass = function () {
      return this.eacharg(arguments, function (el, name) {
        el.classList.add(name);
      });
    };

While they don't do exactly the same (Umbrella JS is more flexible but jQuery supports IE9), compare that to jQuery's addClass():

    addClass: function( value ) {
    	var classes, elem, cur, curValue, clazz, j, finalValue,
    		i = 0;

    	if ( jQuery.isFunction( value ) ) {
    		return this.each( function( j ) {
    			jQuery( this ).addClass( value.call( this, j, getClass( this ) ) );
    		} );
    	}

    	if ( typeof value === "string" && value ) {
    		classes = value.match( rnothtmlwhite ) || [];

    		while ( ( elem = this[ i++ ] ) ) {
    			curValue = getClass( elem );
    			cur = elem.nodeType === 1 && ( " " + stripAndCollapse( curValue ) + " " );

    			if ( cur ) {
    				j = 0;
    				while ( ( clazz = classes[ j++ ] ) ) {
    					if ( cur.indexOf( " " + clazz + " " ) < 0 ) {
    						cur += clazz + " ";
    					}
    				}

    				// Only assign if different to avoid unneeded rendering.
    				finalValue = stripAndCollapse( cur );
    				if ( curValue !== finalValue ) {
    					elem.setAttribute( "class", finalValue );
    				}
    			}
    		}
    	}

    	return this;
    },

jcoffland9y ago

It's interesting to note that the author has written more lines here in this thread than are contained in the compiler in question. The English language is not nearly as concise as APL.

skybrian9y ago

It seems like there is a missing explanation of the language this compiler compiles and why someone would want to use it? (Searches on "dfns" and "co-dfns" don't find much.)

fourier9y ago

Here is the link: https://www.youtube.com/watch?v=gcUWTa16Jc0 and proper q/a thread: https://news.ycombinator.com/item?id=13638086

nattaylor9y ago

>for every one of those 750 lines, I've had to examine, rework, and reject around 5400 lines of code.

I guess there's no such thing as "good enough" with a compiler?

Those are staggering numbers to me. Kudos to the author.

arcfide9y ago

The live session is up and running now. You can find more information about the stream and ask your questions at the following post:

https://news.ycombinator.com/item?id=13638086

known9y ago

AKA https://en.wikipedia.org/wiki/Pareto_principle

n0mad019y ago

thats roughly 1369 loc added per commit or 1855 loc per day.

j / k navigate · click thread line to collapse