In C, every call to the allocator is explicit, that is, if you are using an allocator at all. Compare to oldschool C++, with new/delete and raw pointers, where you may call the allocator explicitly, but still, a lot happen in destructors, automatically. In modern C++, with smart pointers, it is essentially like a garbage collected language in the sense that allocation and deallocation all happen automatically.
Ok, but even with C we can't actually low level manage memory how the processor does it. You can't tell the processor what to keep or not on what level of cache, what to send to virtual memory, etc. It's lower level than say Python but I don't think it is low level memory management in the way PDP-11 C was.
The fact that modern memory load operation involves cache, protection, memory mapping, etc.. is not a property of language, but rather of the environment (CPU + OS).
But those aren't abstractions that we can treat as black boxes, we need to know them and how to code taking them into account without actually having control inside the black box.
0. https://www.coreboot.org/data/yhlu/cache_as_ram_lb_09142006....
neither can assembler so it is useless distinction
Even Rust developers like myself labour within the fantasy that a pointer is, y'know, like an address to memory, a real "physical" thing. Rust (and to some extent C++) introduces some management abstractions in front of this in the form of references and borrowing, but the main concept is still there.
In reality the kernel of your operating system has put a giant layer between you and the physical memory, and the "address" and "pointer" are really just handles behind which the OS and MMU do all sorts of shenanigans.
"Raw pointers" really aren't raw. They're handles to offsets within pages, which can be all over the place. It would be entirely possible to walk away from the libc & C model entirely and work in a world of pure references interacting directly with VM subsystem pages as some kind of "object handles" and be much closer to the actual operation of the underlying system.
So C can do actual memory management, your OS or hardware just won't let you. I've done programming for audio effects gear where memory is directly accessible by real address. Often with different memory chips with different performance characteristics (for cost reasons) corresponding to different pointer value ranges. Just because your machine won't let you do it doesn't mean C isn't capable of it.
Meanwhile, virtual memory is as close as you can get down to the physical hardware in terms of normal CPU instructions (i.e., not VM management code). VM as a concept is orthogonal to raw pointers, which can be either virtual or physical.
Raw pointers are nothing like handles. They need to be manually "synchronized" properly with VM management (which happens completely behind the scenes for 99,99% of userspace code) to make sense but it's not like there is bookkeeping overhead in copying or offseting a pointer, like there would be for a "handle".
The point of a handle is that it's use to hold objects, to keep them alive. Raw pointers don't do that.
Is this possible in Ring 3? Or would everyone be running in kernel mode at that point.
Even if you do away with that layer, then there may still be a hypervisor lying to the kernel about memory.
You don't get direct access to the stack in C either. Stack frames are abstracted away, and you only get longjmp.
If you pay attention to Undefined Behavior and strict aliasing, you don't even get that much access to poking around memory.
In fact, there is hardly any difference between VMS BASIC and VMS C in terms of what is possible, if we want to take the discussion outside of 8 bit versions.
C can't even do all of integral arithmetic safely. It's a language that goes really out of its way to add unsafety.
But integer arithmetic is safe in terms of Rust.
> Checking every addition adds a massive slowdown
It only does so for debug mode. In release mode, it uses modular arithmetic.
Your comment reads like nonsense. Are you able to provide what you feel is the best example that substantiates your claim?
So if you have a = -1, b = 1000 and compare the two, a > b is actually true.
Manual memory management isn't that much faster than a modern GC, sometimes even slower. I'd argue that C programs are typically fast because there is just less rope to hang yourself with, leaving aside memory safety.
The anemic abstractions provided in the language and the tiny stdlib means it takes a lot of work to achieve something, so developers simply do less. There isn't even a Hashmap (or a proper String), while in Kotlin, you can perform a deep copy of the object graph and convert it to json in parallel in a single line if you so wish.
> The anemic abstractions provided in the language and the tiny stdlib means it takes a lot of work to achieve something
Which has the additional effect of forcing you to be a bit smarter about how you do things, to be less wasteful. It forces you to contend with everything you want to do, to consider it and the cost associated with it. Built-in, general-case abstractions are nice when under time constraint and hacking something together, but it doesn't make for good software. Not only is it almost guaranteed to be slower than a properly constructed purpose-built solution, but it also removes your view from thinking about the cost of every single thing you're doing. It makes it easier and attractive to overuse abstractions, to over-engineer solutions, and to approach problems from a standpoint where you simply throw the kitchen sink at the problem because that's the only thing you can think of.
That is kind of misleading. The difference is that C and Rust support stack allocation, which is essentially an arena style allocator integrated into the language. What the fancy pointer bumping GC runtimes do, the stack does by default. The problem is that escape analysis is difficult and it is difficult to prove that an access to memory on the stack is safe without fundamentally changing the language like Rust does. It gets worse on the heap, where you can have runtime determined ownership.
C programmers like their doubly linked lists, but when you think about it, it is actually kind of a difficult problem to formalize and analyze in its full generality.
Control over memory layout and manually allocating and freeing memory are orthogonal issues.
I can optimize memory layout in Java too by using primitive data types instead of pointer-chasing objects, or structs of arrays vs array of structs type of things in order to improve access patterns. I can't control alignment and padding, except indirectly, thats true, but that is not what people mean when they say "manual memory management". Rust gives you control over memory layout, but has "automatic" memory management.
> forcing you to be a bit smarter about how you do things, to be less wasteful
Yes this is what I meant.
For balance, the faster machines get, the more problems are most effectively solved by throwing the kitchen sink at them.
> controlling memory layout
Garbage collectors can compact active memory into one contiguous location and adjust the active pointers to point there instead. You can't do this in a language like C, because you can have arbitrary pointers to anything, and there's no runtime indication of what's a pointer or just an integer. You simply have to prevent memory fragmentation in the first place, which also complicates the logic of the program.
For faster allocation in C, arena allocation based on object lifetimes can be used [1]; in generational garbage collectors, you get similar benefits, but it's just done automatically. In fact, in that linked paper, they found that lifetime-based arena allocation improved the speed of their program (a C compiler) at the cost of increased memory allocation compared to naïve malloc() and free(), which is exactly what garbage collection does.
As a result of compaction, memory allocation with garbage collection is just a pointer bump in the best case, whereas allocation with just malloc usually requires searching a free list or a tree.
Every year people who do not understand why C is so successful try make a name for themselves by breaking what makes the language great (such as the complete agnostic set of control structures). C has successfully remained portable and performant for 50Y+ because of its flat memory model (with a few tweaks such as "volatile".)
It's a pointless discussion when you don't care to explain how you use the words that obviously have many related but different ways to interpret them.
If you’re new to the language and want to understand how to use it like a pro then ignore this post - it will only confuse you and reduce your ability to use C effectively.
I would agree the title is a bit rhetorically rough, though, because being the wrong low-level language doesn't make it a high-level language. WASM would similarly be "wrong" if I claimed it was a direct mapping to modern hardware, but that doesn't make it "high level".
(Although what really frustrates me about C isn't that it's a bad mapping per se. It's from the 1970s, what do you expect? And it is obviously still quite useful for many cases. What frustrates me is that it continues to a large degree to dictate language design and heavily color how language designers see hardware, so too much modern language design is still just reshuffling bits of C around, rather than building languages that work with the hardware well.)
I don't think I've ever really seen a good argument what developments were prevented by the existence of C as an important compiled language. The one claim I can remember I find ridiculous: that today's CPUs execute instructions in parallel, not serially. Well, for one, C's semantics aren't that serial, there is a large degree of freedom for compilers and CPUs how to schedule the execution of C expressions and statements. Then, there are SIMD instructions exploiting those capabilities explicitly. But also, the rest of the code gets automatically pipelined by the CPU, according to a specific CPUs capabilities. Even though that stuff happens in parallel, any instruction encoding is by necessity serial. Or is anyone proposing we should switch to higher-dimensional code (and address spaces)?
I liken it to doing imperative UI development on top of the DOM abstraction in a browser. Yes, under the hood, the browser is choosing when to re-evaluate and repaint interface elements, but you can't touch any of that; you're instead rearranging things in the DOM and memorizing heuristics the browsers use to try and trick the browsers into matching changes to the DOM to visual changes in the browser UI efficiently.
It may very well be time for a low level languages to encourage us to think about programming as "arranging independent blocks of code that can be executed in parallel, with only a handful of sequencing operations enforcing some kind of dependency order. Apart from honoring those sequencing requirements, order of execution or whether execution happens in parallel is undefined."
You just looked in your fish tank and declared what the weather is going to be like in the Atlantic ocean... Like... these things have nothing to do with each other. The fact that C has functions or structs has nothing to do with it being awful influence on designing hardware.
Here are some reasons why C is awful.
* It believes that volatile storage is uniform in terms of latency and throughput. This results in operating systems written with the same stupid idea: they only give you one system call to ask for memory, and you cannot tell what kind of memory you want. This in turn results in hardware being designed in such a way that an operating system can create the worthless "abstraction" of uniform random-access memory. And then you have swap, pmem GPU's memory etc. And none of that has any good interface. And these are the products that despite the archaic and irrelevant concept of how computers are built have succeeded to a degree... Imagine all those which didn't. Imagine those that weren't even conceived of because the authors dismissed the very notion before giving the idea any kind of thinking.
* It has no concept of parallelism. In its newer iterations it added atomics, but this is just a reflection of how hardware was coping with C's lack of any way to deal with parallel code execution. C "imagines" a computer to have a CPU with a single core running a single thread, and that's where program is executed. This notion pushes hardware designers towards pretension that computers are single-threaded. No matter how many components your computer has that can actually compute, whenever you write your program in C, you implicitly understand that it's going to run on this one and only CPU. (And then eg. CUDA struggles with its idea of loading code to be executed elsewhere, which it has to do in some very cumbersome and hard to understand way, which definitely doesn't rely on any of C's own mechanisms).
I thought about the implications of a "parallel" statement, where everything is assumed to execute in parallel and oh boy are the implications big. C's semantics are serial but they contain implicit parallelism. The equivalent is that the parallel statement contains implicit sequentialism that the compiler can exploit to reduce the amount of book keeping needed by the CPU to schedule thousands of instructions at the same time. E.g. instead of having an explicit ready signal and blocking on it, the compiler can simply decide to split the parallel statement into two parallel statements, one executed after the other. Implicit sequentialism! A parallel statement implies that no aliasing writes are allowed to be performed. I don't know what the analysis for that would look like, but in many common cases I would expect the parallel statement to be autovectorized quite reliably.
>Even though that stuff happens in parallel, any instruction encoding is by necessity serial. Or is anyone proposing we should switch to higher-dimensional code (and address spaces)?
Uh, you know we can just encode the program as a graph? Graph reduction machines are a thing, you know.
That is sort of a thing: https://en.m.wikipedia.org/wiki/Very_long_instruction_word
If you have multiple instructions grouped together like this you could think of it as being a 2D array of instructions
However, before dismissing this all as a bad mapping to an outdated 1970s model of computation, I'd like to see a good alternative. CUDA has clearly shown that there's an acceptable model for massively parallel data sets, but that doesn't handle branch heavy code very well at all. And FPGAs have a different approach for a completely different kind of problem, but I don't know how you would expose what Apple, AMD, or Intel chips are doing under the hood and have it be at all manageable to the programmer. How is someone supposed to indicate what's next when a pipeline stalls waiting on the previous operation or a cache miss? Is the programmer going to toss micro ops into separate execution units and wait for the results to come out the other side in arbitrary order? Is this an async/await model for every addition or memory fetch? I think it would be complete spaghetti to even try, but I'd love to be shown I'm wrong.
People get all excited trash talking Itanium, but I think it's a lesson that if you try to expose any alternative to the 1970s model they'll just bitch about how there are no sufficiently smart compilers. And of course it got scooped by AMD64 pretending to execute one instruction after another.
And if there isn't a good alternative, I think C (or Rust, or WASM) are a pretty good fit for what you've actually got to work with at the low level.
I have a pet theory that lisp macros over an assembler is the right high level language for systems programming but that hasn't made it off the whiteboard yet.
So taking out Assembly, any language can have hardware capabilities exposed as compiler intrisics, that is nothing special about C in that regard, only the one many people are commonly aware of because they don't to be educated in compilers.
One of these days I really need to post my "ideas for languages" that I've got banging around on my hard drive, but one of them is "a language that deals with the increasingly heterogeneous nature of the computer". You've got the CPU, the GPU, efficiency cores, whoknows what else in the future (NN cores), and it's only a small hop from there to consider other computers as resources too.
Full disclosure: I have no idea whatsoever what this looks like. Especially in light of the fact that you need to build not just for the exact machine you're developing on but for machines in the future as well. Some sort of model of what is being computed and some guestimate at the costs? (Something like an SQL query builder where you declare your goal and it does the computation about what resources to compute it with?) It's also possible that the huge gulfs in performance between all these parts are just too large to bridge and manual scheduling of all these resources is just the only choice.
Even just within a CPU it's rather annoyingly difficult to use vector-based code in modern languages. Perhaps something like an array-based language, but one that discards that field's bizarre love affair with single-character (if not outright Unicode) operators and can be read by a normal human, and just affords writing code in a style that SIMD becomes a sensible default rather than something the optimizer laboriously reverse engineers from your conventional imperative code. (Array based programming could really use a "for humans" version of those languages in general.)
To some extent, just sitting down for a year to learn modern assembler and starting from the very, very bottom once again to build a high level language, rather than starting with C and building "C, but ..." which is pretty much every modern language being developed, would be an interesting exercise if nothing else.
Another little example is I think Jai was supporting structures-of-arrays instead of arrays-of-structures, though I don't know if they kept it. I'd like to see a language where the language-level data structures are explicitly viewed through the lens of "how I serialize these into memory", rather than the data structure implicitly creating such a specification by how it is defined, so for instance you could swap out a SoA to an AoS by swapping only the way the compiler serializes to RAM and not any of the rest of the code. Obviously you provide defaults that look like modern languages, but with this you could directly implement things like tagged unions with custom bit layouts, or theoretically, directly accessing gzip'd data by specifying that this data structure can only be accessed sequentially but as long as that's what you do you don't need to directly unzip it, etc. This doesn't directly answer "how do you utilize modern hardware correctly" but gives you tools to potentially create a better match than what compilers give by default.
Again, to be clear, this is crazy pie-in-the-sky far out ideas that I do not have an implementation in mind for, but it's the sort of thing I'd like to see more experimentation with on the fringes of language dev. (And I only wish I had time to do it myself. Unfortunately, I simply do not.)
(And, as the sibling comments point out, yeah, assembler technically, but that's kind of a cop out.)
Isn't C the language (x86_64) processors are designed to be fast for? Sure they added a large amount of abstractions but since they were made for C is there any language where the processor doesn't have to laboriously emulate?
Yup
I mean they also optimize for Java and JS and .NET and probably Swift and Rust.
But C still takes precedence, I bet
Nope. They compete on performance in C++ (games mostly), Java (enterprise SKUs, but same core architecture), and JavaScript (browser benchmarks even though raw JS performance is a very small part of browser responsiveness...)
If you actually read the spec, you would have found that they didn't "forget" these.
They carefully studied them and judged the encoding space is better used elsewhere.
I agree with you, but if you could convince WG14 to remove a lot of the stupid UB, that would be closer to the case.
(I know you're trying from your "One Word Broke C" article. Which, by the way, is putting up a server error right now.)
And it never was!
Just keeping this point in mind would reduce the plethora of discussions about undefined behaviour to the essential, i.e. the useful discussions, i.e. the 0.1%.
Then you're GTG
I think it's more useful to say that C is a portable assembler, than it is to say that it isn't, considering how it's used in practice and the sort of nasty things C compilers do in order to make that possible.
I don't think the author's point is that "C is not a good language for systems programming." You are not going to have an equivalent to volatile int *dma_register = SCATTER_GATHER_BASE; in Haskell.
The author's point is that the drive to make C and other "model the von Neumann machine" languages execute quickly has made the compiler very complicated (the author is implying that "low level requires simple compiler") and that processors built to make such code run quickly are also very complicated. And those complications carry costs.
In many ways this is a "call to programming model action" and cites GPU as illustrating the potential when "new programming model" and "silicon to support it" are done in concert.
The original one is the one the article uses: low-level languages are non-portable and tied to the hardware on which they run, and high-level languages can target multiple platforms. Under this definition, C is absolutely a high-level language.
My complaint would not exactly be that the author is playing semantic games; it would be that they are clinging to archaic terminology in a way that does more to confuse than enlighten. The "generations" taxonomy is generally more descriptive.
1st: Machine
2nd: Assembly
3rd: General-purpose
4th: Application-specific
The 3rd/4th distinction gets a bit muddied sometimes, and back in the 80s and 90s people talked about a 5th generation that never really took off. But a couple (I think) clear examples of 4GLs are SQL, HyperCard, and Mathematica.What I like about that approach is that it mostly breaks languages up according to fairly clear distinctions about when you would use them. And then we can use "high/low-level" as a relative term, where higher-level languages tend to do more to abstract away the details of what the computer is actually doing. That does mean that higher-generation languages tend to be higher-level; all we lose in doing it that way is the ability to have silly arguments about where to place a completely arbitrary (and, frankly, useless) dividing line.
I also like that this way we can recognize .NET IL, WebAssembly, and Java bytecode as very high-level 2nd generation languages, which, at the very least, is fun.
Oh, and Forth is a 3rd generation language. Fight me, Chuck.
I think this post goes way way way above boringness of day2day jobs.
Yea, this post is not about how to use hammer, but more like curious consideration whether using hammers everywhere is not limiting us (C design)
Maybe it [EDIT: the post] is, but the title is obviously nowhere near accurate - if C is not a portable low-level language, what on earth is?
[1] It gets reposted everywhere so often I have read it multiple times, and the one thing in common I see is how every know-it-all crawls out of the woodwork to comment on the title, as if the title was something new, deep, profound or even correct.
If you use sys calls, work between different breeds of operating systems (UNIX, POSIX and Windows are not compatible with each other), you need to rewrite or wrap relevant parts, or write the relevant part beforehand inside ifdefs to be able "port" it between systems.
The gist of the piece is, hardware is evolving to please C's programming model, hiding all the complexities C is not aware of, and behave like a PDP-11 on steroids. This is why we have truckload of side-channel attacks in X86 to begin with. To "emulate" PDP-11s faster and faster.
This question doesn't have to have an answer. The author of TFA apparently believes that a low-level language is one that effectively and clearly exposes the execution model of the hardware to the programmer. Under this definition, no widespread language (except assembly) is truly low-level, and possibly none are.
Which, for what it's worth, is also what I was taught in school. C was consistently described as a high-level language by my professors, even if it is "lower-level" than almost everything else.
i.e. truly low-level languages can't be portable and is bound to the architecture.
Also a reminder that any language can have toolchains with extensions exposing low level features.
Even worse, adding a comment on how actually you shouldn’t be curious and understand how things really work.
Which basically then boils down to ISA design. If you could design an ISA from scratch for the hardware you design from scratch, what would you do? Well, there aren't that many options. Stack machine, dataflow machine, VLIW machine. All of those have been tried and the modern superscalar CPUs kick their butts on every metric except power.
The whole article kind of misses the point anyway. We should probably be running higher level languages for most things anyway, which shouldn't be overly constrained by hardware design. For everything else, 100% serious, there is WebAssembly, and hardware ISAs will fade below this level of abstraction in the fullness of time.
Except it doesn't really, the standard leaves most of the really machine-dependent parts undefined; only very few things are left implementation-defined.
Plus, of course, C is quite unsuitable for any platform that uses segmented memory/non-flat addresses (which are things that are trying to come back in vogue but C's wide spread really, really hinders that).
Well that's because it is low level and, especially, simple, and doesn't try to abstract things.
If your hardware is significantly different, it only looks low level. In reality plenty of mapping and conversion goes on behind the scenes - sometimes with hilarious consequences.
The C standard is a description of an abstract machine. You get UB and unexpected miscompilations, because the optimizer is not evaluating how your code runs on the machine you're compiling for, but simulates running your code on the weirdly abstract C machine, one that can't overflow signed integers.
And C abstracts away almost everything about stack, stack frames, and all the complexities of memory and cache hierarchies. They are abstracted to be uniform linear address space.
Incidentally, C never really became popular on 6502 because, arguably, that ISA is somewhat hostile towards efficient implementations of higher-level languages.
Either way, C is equipped ok for that - at least as well as most systems languages C++, Rust, etc. - simply because dealing with allocation and raw addressing (at least raw within the process memory space) is a fundamental part of the experience. Throw in a few compiler extensions (because you'll need to change the compiler to make use of this anyway) for things like where to locate static allocations and use library functions that add dynamic allocation in specific spaces. It will get hairy, but it's at least possible with some very careful programming.
And it is very reasonable if you are under 50 years of age, that you haven't heard of it.
COMFY-65 is a compiler for a small Lisp language that provides all non-branching operations of the 6502 processor as primitives (e.g. tests for carries, overflows, zero, and negative; set decimal arithmetic mode; etc.). However, programs still consist of subroutines, loops, and tests, with no "go to label" construct provided. It's surprisingly simple and, I would say, elegant.
Here's the PDF that outlines it: https://dl.acm.org/doi/pdf/10.1145/270941.270947
Also there is quite a lot in modern assembler that you can’t really get to from C, like prefetch and cache flushing instructions.
LLVM or QBE, for example.
"C does not behave as a typical ‘high-level’ language, because it offers a number of features which are more normally associated with ‘low-level’ languages such as assembly language. These include the ability to write data to and from particular memory addresses, facilities for operations on the contents of memory locations, and instructions for incrementing and decrementing integer variables ... Thus C allows the programmer the flexibility and efficiency of working at low level with the advantages of working at high-level, for example the more advanced data structures and program flow controls typical of today’s computer languages. For this reason, C is sometimes described as a ‘high-level low-level language’ or as a ‘low-level high-level language’." - https://archive.org/details/computerprogramm0000ford/page/13...
The author argues that C's model does not fit the model he defined himself and claims to be the same model used by everyone.
After going through the article, I'm left with the impression that the author's thesis is flawed and relies on a series of strawmen arguments. Among the strawmen we find:
* arguing that speculative execution "were added to let C programmers continue to believe they were programming in a low-level language".
* claiming that "modern processors are trying to emulate "the same abstract machine as a PDP-11"
* "Creating a new thread is a library operation known to be expensive, so processors wishing to keep their execution units busy running C code rely on ILP (instruction-level parallelism)."
* etc etc etc.
I don't think this opinion piece is grounded on reality, let alone is an objective take.
In your opinion, why was speculative execution added? It doesn't seem off base to suggest it was to enable programmers to continue writing single threaded applications while increasing execution speed.
In your opinion, what is wrong with the statement that modern processors are trying to emulate an abstract machine like PDP-11? To me it seems largely right.
I have listened a talk where developers used such subsystem in the given processor. If unused, they would spend 95% of their time window just to copy the data, however by requesting the data ahead of time via that engine, they only used 10% of their time window to get the data, and accomplished what they wanted in ~50% of the time window they have, leaving tons of time for further features and improvements.
If x86 had a such feature, I'd use that in my Ph.D. to request the matrix data I'm accessing ahead of time, because the pattern I use is not linear but well defined. Now, if I want to accelerate that code further, I need to reorder my matrices to make the prefetcher happy, and refactor the whole codebase from top to bottom.
https://stackoverflow.com/questions/48994494/how-to-properly...
That's the only example I'm aware of. Are there others? (I'm sure you could do it poorly if you wanted to, but how much history is there to extrapolate from?)
In general leaking microarch weirdness matters less if you don't have backwards compatibility.
To answer your question off the top of my head, answering different bits of the issue, from the perspective of the era of active programming language R&D not themes on themes on themes as we have now...
Limbo, Occam (Occam-pi, etc.), APL (I/J, Aplus, etc.), Oberon (Oberon 2, Oberon 07, Active Oberon, Zennon)...
That was not my intention at all.
You asked what alternatives there were. C is a systems implementation language, designed to be compiled to object code that will run on the bare metal.
I offered some examples of alternatives to that role, as I thought you asked. I did say that they explored different aspects of the problem.
As I said to someone else upthread:
It does not need to be a relative statement in order to be correct.
The statement "C is not close to the instruction set of a modern CPU" does not need to be validated by specifying examples of languages that are closer.
https://www.youtube.com/watch?v=6lOnpQgn-9s
It's worth the time, IMHO, and I dislike video presentations. This one is different.
She designed the ARM processor (and BBC BASIC before that).
Actors, more precisely active objects in Active Oberon, the only one still actively being developed at ETHZ from Oberon linage.
We currently have a problem where we can't have thousands of cores because, even today, so much code is designed to be fast on one core.
We really have to move the asynchronous programming because synchronizing async hardware is both complex and inefficient.
RISC V is probably going to help since it allows for a lot of experimentation.
- Languages with "better" (=more modern hardware friendly) loop constraints are easier to parallelize (Fortran, Erlang, …)
- CPU architectures with better programmable vectorization (ARM SVE, Risc-V VE) are much easier to work with, if the language primitives allow it (see above)
Porting software over to fortran/erlang on aarch64 is something you can already do today, if you want to. Rust/Zig/etc. and RISC-V could have a good opportunity here to figure out better ergonomics for vectorization and more hardware friendly cache coherency policies, too, but no clue if anyone in the relevant standard gremiums cares.
In terms out "but what can I easily use as drop-in replacement?" Yeah, we're kinda stuck with C and languages that inherit its problems (current Rust/Zig/etc. included).
https://github.com/wekan/wekan-node20#trying-to-compile-llvm...
C89 compiles to 30+ CPU/OS:
Indeed there is no direct match anymore between instructions and gate combinations on the processor die. There is a microcode translating x86 instruction into whatever electronics are below. Change this microcode, and you could have your processor speak a different binary code (matched to a assembler language).
The real answer is: none. There are two problems, the first is you have to rewrite the world with the new language and hardware.
The second is, unfortunately, language enthusiasts who are willing to rewrite the world AND can get job done want a language to target a sequential abstract machine (i.e. look like C).
Is parallel programming hard? Without any further details or specifics, yes it is. It is far harder to conceptualize code instructions executing simultaneously, than one-at-a-time in a sequential order.
If I program (map inc [0 1 2 3]) is it really any more difficult to conceptualize the (inc ) function performing on each element sequentially than in parallel?
I think the difficulty of parallel programming is less innate and more two fold:
1) languages often default to sequential so to do async requires introducing additional primitives to the programmer
2) knowing when to effectively use parallel programming
When I have a list or stream that I know has independent elements that require wholly independent calculations then parallel programming is straightforward
Where people get hung up is trying to shoe horn async where it is either unnecessary (performance is equal or worse than sequential) or introduces breaking behavior (the computations are in fact interdependent).
(Fun fact: I once had someone call HR on me because they didn't know embarrassingly parallel was a technical term, and they thought I was belittling them)
While I'd probably argue that it is still more difficult to conceptualize, the statement we're discussing is presented as broad and general. I'd call it far less misleading if it said something like:
There is a common myth in software development that parallel programming *has* to be hard.
I don't think folks so much "shoe horn async where it is unnecessary" as the red/blue problem causes async code in most languages to spread.
Or by "async" do you just mean concurrent code? I'm reading "async" to mean lightweight coroutines or similar.
When you say "without further details or specifics" you're saying "using my default framework of a c/ c descendent world"
The author's point is that sequential programming is one type of simple programming, but it's not the only type, and it doesn't map easily to modern hardware
There's some irony to the fact that you re-interpret my opinion as being very specific to C and (indirectly) posit that - in that specific case - parallel programming is hard, and then yourself go on to select a very specific case where parallel programming is not hard, because some matrix operations are independent.
I agree that there are languages that are explicitly built to make parallel programming easy. But in general, and not just related to c or c descendant languages, parallel programming is hard.
The common myth - you're doing parallel programming? That sounds hard
It's not always hard. It really isn't! You don't need to be a genius or an expert to write parallel code.
Maybe where we're getting caught up is Cassie K's comment on ml engineering. You don't need to know how to build a microwave to use a microwave. In the same way, you don't need to be a genius or some deep expert in distributed systems to use abstractions that parallelize your programs
To write a parallel program does not require that you know what a mutex is. It just needs you to understand some simple algebraic (6-8th grade) properties about your functions (and, in fact, for library functions, they can be annotated as associative)
There is a broad spectrum of parallel programs. Somebody using a web server implementation? They've made a parallel application
Somebody running tensorflow or pytorch? Also parallel! Even for simple stuff!
You could be a beginner programmer and be taught to make parallel programs without understanding distributed systems. It's not always hard. It's not generally hard. The complex bits are hard. The simple bits use 8th grade math.
That Erlang exists and people use it successfully does not mean that harder things aren't.
Or do you mean an individual can't do things in parallel?
Like.... Pushing all of those grocery carts in a long line is moving them in parallel
Or do you mean processing? Like thinking?
The statement "C is not close to the instruction set of a modern CPU" does not need to be validated by specifying examples of languages that are closer.
But no. That is what I meant when I said this is not a relative statement.
If the title said "C is not the lowest-level language" then your objection would be valid... but it doesn't and it's not saying that.
But before I go into some lengthy explanation: have you read the article, or are you responding to the title alone?
- have an "abstract" machine that is more concrete than C (and by extension less portable)
- be easier to lower into optimal assembly (especially loop ops)
- give you strong and precise compile-time guarantees about memory layout (padding, bitfields), variable sizes, register spilling, stack usage, etc.
You could argue brainfuck is machine language for a theroetical infinite tape machine, but such a machine can only exist when implemented in high-level software.
Lower level is something that maps more directly to machine operation (assembly, maybe forth).
Higher level is something that has it's own semantics of operation and need to be converted to into the machine operation, the more conversion the higher the level.
LISP (1960), Smalltalk (1972), BASIC (1963), FORTRAN (1957), COBOL (1959) and countless others. Heck, ALGOL (1958, 1968) was much higher level than C too.
Nowadays a "high level language" is one where the person using it doesn't necessarily have to think about memory usage and allocation, since that's the task of a garbage collector - you accept a small amount of inefficiency in order to get a program that works "good enough" in 99.9% of all cases (since we're not on ancient devices anymore and most programmers don't write code that upsets the garbage collector in novel ways). By this criteria, Java, C#, Python, JavaScript, Ruby and so on are "high level languages" in that the programmer rarely has to think about this sort of thing; the underlying GC takes care of memory concerns. There's a reason you see these languages used more for end-user tools like webdev, scripting and desktop applications - the penalty is considered worth it (since it often ends up only shaving off milliseconds at most).
By contrast a low level language basically makes the programmer an active participant in memory management, with all the footguns that come with it. C and Rust are both two extremes of this - C just lets you do whatever, any form of memory control is up to you, segfaults included. Meanwhile Rust deliberately prevents you from doing anything that could possibly cause segfaults through its borrow checker. In some ways C can give you a lot more freedom to be efficient in how you allocate/deallocate your memory (or in the case of Rust - write code that is always memory safe), but you do trade things for it (in C you basically have to be really meticulous about free()-ing memory while in Rust you have to eat a lot of complexity upfront to not upset the borrow checker).
Also contrasting to high level languages, the modern domain of lower level languages tend to be things like drivers, kernels, RDBMSes and the like, rather than conventional user-facing applications (which it also was used for in the past since most of the previously mentioned languages are either pretty young or took quite some time to mature). Still useful, just a different set of expectations, since those are the components that have to be fast so the rest doesn't have to be as hyperefficient.
only if you malloc()/free() for every allocation/deallocation. if you use any other allocation strategy then this is never an issue.
for example: see the "Rewriting the memory management" section in this article: https://phoboslab.org/log/2023/08/rewriting-wipeout
> I'm not sure what the original PSX version did, but the PC version had a lot of malloc() and little fewer free() calls scattered around. Now I can assure you that the game doesn't leak any memory, because it never calls malloc().
> Instead, there's a fixed size statically allocated uint8_t hunk[MEM_HUNK_BYTES]; of 4mb that is used from both sides:
> A bump allocator takes bytes from the front of the hunk. This is used for everything that persists for many frames. When the game starts, it loads a bunch of assets that are needed everywhere (UI graphics, ship models and textures etc.) into this bump allocater and then remembers the high water mark of it. When you load a race track, it loads all assets needed on top. After finishing a race, the bump allocator is reset to the previous high water mark.
> On the other side, a temp allocator takes bytes from the end of the hunk. Temporary allocated objects need to be explicitly released again. This is used when loading a file into memory. The file is read at once and unpacked onto the bump allocated side. When done, the temp memory for the file is released again.
> Temporary objects are not allowed to persist over multiple frame. So each frame ends with a check to ensure that the temp allocator is empty.
> Somewhat related, the OpenGL renderer does the same with the textures: It bumps up texture memory (more precisely space in the texture atlas) and resets it to the previous level when a race ends.
if you use a system like this—either malloc() just once (or a few times) at the start of your program and then never manually free(), or just use statically-allocated arrays—then you never have to worry about "meticulous free()ing". I'm not sure why this never seems to be taught in early CS courses that teach C—it seems that basically everyone comes away thinking malloc()/free() OCD is the only way to manage memory with C, and is thus undesirable compared to the ease of use of garbage collection.
When I was introduced to C during high school, my teacher presented C as a low-level language compared to what we previously studied (which was Ruby).
And I just ate that up because C looked less readable than Ruby, today (10 years later) I have to disagree with my teacher. C is not a low-level language, it has access to the lower level parts, sure. But it is an high level language!
It was introduced to me as "glorified PDP11 Assembly Language". So the claim has been made at least once.
Granted, there are people here commenting that maybe assembly language is not "low-level". I'm lost for words.
I've done a handful of forth projects as part of a code-dojo years ago. I wouldn't have considered it low-level.
Note that I am not far in the forth rabbit hole at all, any interest I may show is incidental, a side effect of my interest in postscript, which is very much a high level language.
This has nothing to do with C. The hardware insists on this abstraction. And its a good job too, otherwise your programs would stop working when moved to a machine with different cache.
You see similar problems with things like vectorization – Rust was in a good position to define semantics more amenable to ARM SVE / Risc-V VE, but all existing SIMD libraries are written for C and x86 semantics, so that's what Rust is currently stuck with, as are most other languages.
The language matters less than the fact that there's a lot of existing code around. That code needs to keep working.
* tiered memory hierarchy pretending to be flat RAM
* CPUs that are much bigger than the ISA suggests, and which have out-of-order and speculative execution so code can make good use of their resources
* optimizing compilers that further decouple the program as written from its execution
IBM was working on this stuff in the 1970s, well before the rise of C. It’s fair to criticize the model and seek out alternatives, but it isn’t fair to blame C.
> One VLIW instruction word can contain several independent instructions, which can be executed in parallel without having to evaluate them for independence. A compiler must attempt to find valid combinations of instructions that can be executed at the same time, effectively performing the instruction scheduling that conventional superscalar processors must do in hardware at runtime.
If your CPU exposed the single-stream parallelism at the interface, you can do it at compile-time or even decide it with in-line assembler.
I wonder if it hasn't caught due strictly to the business dynamics of the industry, or are there technical reasons this isn't really a good strategy?
I believe Linus has a good even if tangentially related to this exact topic rant at [0]. "While the RISC people were off trying to optimize their compilers to generate loops that used all 32 registers efficiently, the x86 implementors instead made the chip run fast on varied loads and used tons of register renaming hardware (and looking at _memory_ renaming too)."
The key design puzzle at present seems to be that memory access takes many more cycles than arithmetic. Bundling a few cycles of arithmetic with a few hundred cycles of memory load is kind of pointless. So VLIW works well if you know memory access is going to be fast, which roughly means knowing it'll hit in L1 cache or equivalent. I think that's part of why it suits DSP style systems.
Exposed pipelines are an interesting quirk of some of these systems. One instruction in a VLIW bundle writes to a register and subsequent instructions that read from that same register will see the previous value for N subsequent cycles, after which the write becomes visible. They're really confusing to program by hand but compilers can deal with that sort of scheduling.
With GPUs of course things have changed and in fact GPUs relied more on static scheduling, but even there as they expand to more varied loads, they are acquiring more dynamism.
Qualcomms' Hexagon is a vliw, I think that's contemporary. Graphcore's IPU is two instructions per word.
On a relative scale, C is very low level compared to how we program today if you think about levels of abstraction.
If “low level” means “runs on the CPU almost literally as written.” then no it’s not.
Ada has System.Address to model machine addresses:
http://ada-auth.org/standards/rm12_w_tc1/html/RM-13-7.html#p...
C++ has std::less specializations for pointer types which provide a strict total order (one aspect of machine addresses):
https://en.cppreference.com/w/cpp/utility/functional/less
There is also placement new and std::launder for more explicit control of typed memory:
https://en.cppreference.com/w/cpp/language/new https://en.cppreference.com/w/cpp/utility/launder
These days, even Java tries to model machine addresses:
https://docs.oracle.com/en/java/javase/21/core/foreign-funct...
IRs like LLVMIR and various bytecodes. Well, those don't map to the hardware 1:1, not even close. So IR must be HLL.
Sure Asm has to be architecture specific, but even then we are getting pretty good at transpilation. And those codes get translated to opcodes anyways on most modern chips.
Basically, unless you are assembling on an ancient system or embedded processor, you aren't writing in a "low level language". Very few folks nowadays do this, so the term "LLL" doesn't occupy much mindshare in semantic space. That leads folks to populate it with what they perceive as low level - the lowest language on the abstraction tree they are likely to encounter - C.
This divide is only going to expand so I say we just accept the definition of low level language has shifted, and call anything where it does closely match... something else, I don't have a good term. Maybe "hardware level language".
But doesn't this still depend on what CPU you're talking about? Your C code will map much more closely to the instructions of the machine code of an 8051 or even an M4 than it will to an x86.
Thus any general-purpose language is more or less "low level" depending on the CPU it's running on. This seems like a poor definition.
>project it into a single dimension
>split it in the middle, thus inventing two useless artificial categories ("low level", "high level")
>get a bunch of highly functioning hackernews 0.1xers to argue endlessly about said useless categories
>submit weekly articles "thing X is NOT in my imaginary category Y!!!"
>profit
Arguing whether or not C is a low level language is about as useful as arguing whether dog-headed men have souls
Next up: IO is not a Monad, x86 machine code is not a low level language, RISC-V is not actually RISC, GPL is not actually open source and so on
My disagreement with the article (aside from the flamebait title) is that many of the things the author calls C problems are actually general computing issues. The reason highly threaded processors are not the norm is not that C can't take advantage of them (it does it just as well as 90% of other languages). The problem is that most problems aside from specialized domains are either highly sequential or require too much synchronization.
Regarding the immutable memory model example - C does not place any limitations at all. Just declare that modifying such an immutable object is undefined behaviour and let programmers figure it out. Memory already has its complexities with NUMA and such, C programmers have no issue taking advantage of these features.
Or maybe take TSX as an example - I'm fairly sure the PDP-11 did not have anything remotely close to Intel TSX and yet it is easy to use in C. Include <magic.h>, write __magicXYZ() and it just works.
Sure, existing C programs will run slowly on the author's imagined new processor architecture, but so will programs written in any language except maybe some highly restrictive very high level language (like GLSL on GPUs, etc.). But new programs that are written with such hardware in mind will not in any way be limited by C semantics and if they are (like with mistakes in standard such as errno for math functions...), it will be one compilation switch away from being fixed.
I guess the sophistication of compilers in recent decades adds to the argument since even the assembler (object code) the C compiler generates isn't going to be as expected due to hoisting things out of loops, common subexpression elimination, etc, etc.
Still, I think the notion of C being a "low level" language is still a useful label ... if not we need to retire this designation altogether.
That's indeed an abstraction over the real computer, but it's a lot less things piled up on your virtual computer's model than C. Current assembly is about on the same level as C was when it was created. Current C is so high-level that it doesn't provide any functionality you can't get with a better, more modern language.
But yeah, I do agree that "low" and "high" level aren't useful names nowadays.
The second argument is that, because of the dominance of C, CPU designers have had to bend over backwards to create something that runs C naturally. Here there are examples like register renaming, flat memory, caching, etc. This argument also makes sense to me, but in the context of the first argument, and the title of the article, I'm not sure how it relates. Taken at face value, this seems to imply that it isn't even possible to create a low-level language on modern hardware, and even machine code is "high-level". This seems to argue that we would have to create a new generation of hardware that exposes much more complexity to the instruction set architecture, and only then could we design a low-level language to take advantage of that.
I think both of these arguments have merit, but it's a little disconcerting to put both of them in the same article, and to make the title "C is not a Low-Level Language". I suppose the first argument could go here, and the second argument could have been done in a follow-up article entitled "Machine code is not a Low-Level Language Either".
Their bigger point seems to be that C is no longer very mechanically sympathetic to huge modern cores, because the abstraction pretends there’s only one instruction in flight at a time. Is anyone aware of a language that fits the hardware better? Maybe Intel needs to release a “CUDA of CPUs” type language.
In any case, at this point, low level and high level are only meaningful relative to other languages.
The article is questioning how “close to the metal” C actually is, but some of the arguments also applies to assembler, which is not that close to the metal either these days.
It seems like the distinction between C and Assembly these days is less important than the distinction between C and say... Javascript. Which is fine by me- English is descriptive and the people who work in Assembly aren't going to get confused by it.
Unfortunately it's also the wrong set of abstractions for the contemporary era.
That said, if you're working in low-level embedded microcontroller world, C's memory model and program structure does in fact look a lot more like those systems.
The abstraction CPUs give you is more or less a fast pdp11 with some vector registers bolted on.
The implementation internally is not.
I then decided to make a language that compiles to C, it's just about adding strings, list and tuple. I almost finished the parser and the "translator" will take more time (I encourage anybody to try lexy as a parser combinator). Basically it will use a lot of the C semantics and even give C compiler errors, so it will save me a lot of work.
Of course I am very scared that I will run into awful problems, but that will be fun anyways.
:-)
In other words, perhaps what's really out of date is not C, but the concept of a general-purpose language which is equally well-suited to any type of task.
* We all agree that c gives you a lot of control to write efficient sequential code
* Modern processors aren't merely sequential processors
* Optimizing c code for a modern processor is hard because c is over-specified - in order to allow humans to manually optimize their programs (given the c memory model etc), it's hard for compilers to make assumptions about what optimizations they can make
It doesn't seem like this is a fundamental problem, though, and c could provide symbols that denote "use a less strict model here" (or even a compiler flag, although I bet incremental is the way to go)
To anyone debating whether C is low/not-low level language note that this discussion is at a much lower level so 'low' has a lower than common meaning.
Another red herring. Register rename isn't the result of some PDP fetishizing. It is a direct result of using more hardware resources than are exposed in the architectural model. Even if it were a stack machine or a dataflow graph architecture, register renaming is what you do when you have more dynamic names for storage than static names in the ISA.
The C abstract machine only has a flat memory model within a given malloc allocation (and within each local or static object). Relational pointer comparison between different allocations is UB (see e.g. https://stackoverflow.com/a/34973704).
So C is perfectly fine with a non-flat memory model as long as each object is confined within a flat memory region (by virtue of being allowed to alias it as a char array). You can imagine a C runtime library that provides functions to obtain pointers to different types of memory that don’t share a flat address space.
The only restriction is that pointers must carry enough information to compare unequal if they point to different objects. Of course, you might be able to construct a virtual flat memory model from the bit representation of void* or char*, but that’s not quite the same as imposing an actual flat memory model.
I feel that the article does explain well enough, how the author defines "low-level" for the sake of this article - and the definition being used seems just as fine as any other. And sticking with this specific definition, the conclusions of the article do seem to check out. (But I'm no expert on the subject matter, so I might be wrong about that).
I feel that the "value" of the article lies in challenging certain conceptions about C.
To me, it doesn't really matter if the article is (completely) right or not - the somewhat indignant response I see happening to the title of the article, and the discussion I see about what "low-level" actually means, seems to prove that some dogmatic beliefs about C are pretty deep-seated.
I feel it's always worthwhile to question such dogmatic beliefs.
No, Spectre is the direct result of processors speculatively executing code without respecting the conditions that guard the code. Hands down, processors hallucinate conditions in code. It has nothing to do with the particular computational model, but would happen in any system that speculates conditions.
And not just one branch, but a whole series of them. In fact, the processor is usually running with a whole buffer full of instructions that are executing in parallel, having been loaded into the reorder engine using nothing more than (normally highly accurate) statistical predictions.
I'm clearly not the target audience for this article, but I still feel like the author would be well advised to put a little note at the top that says "we're talking about CISC and high-end microprocessors rather than microcontrollers."
I'm also not seeing suggestions for languages that do map well to modern microprocessors.
Why do so many smart people waste their friggin' time on such nonsense?
It is perhaps true that CPUs and compilers should execute C code mostly as it is, with only local optimizations to spare programmer of having to decide whether x + x, or x * 2 or x << 1 is faster for example. This would improve system security and reliability while freeing up time to work on great compute languages for vectorizable computations.
But, at the end of the day, CPU makers and compiler writers are humans motivated by both career success and less tangible bragging rights. So OF COURSE they will chase benchmarks at the expense of everything else, even when benchmarks have little to do with real life performance in an average case. I have a 13 year old 17 inch MacBook pro I use for some favorite old games. When I fire it up, I don't see any differences in my computing experience vs a 2023 laptop. So whatever advances in CPU/compiler design were made since do not seem to help with tasks I am actually interested in.
Nobody wants to learn your programming language, write thousands-to-millions of dollars worth of software, just to have it become obsolete two days after the new-hotness processor comes out. Been there, done that.
Alternatively, perhaps, we can place the blame on hardware manufacturers who were looking to cut corners for improved performance and produced insecure machines because they lied to us non-expert hardware users about how fast their systems could go and what we were getting for our money.
Maybe there is a sweet-spot level of abstraction that allows for more fine-grained control of the modern machine, in the sense that compiled code more or less reflects written code, but not so fine-grained as to be unwieldy or non-portable.
Vectorized code that is native to the language could be done with either map functions or Python / NumPy / PyTorch style slicing, which is fairly intuitive. Multithreaded OTOH I’m not sure there is an easy answer.
I grew up with 6502 and 68k. To me, back in the early 90s, C (Mac MPW C to be precise) was an abstract assembler. The code-gen was perfectly readable.
Compared to the likes of Python, it most certainly is low-level. These types of language allow developers to rapidly get something going and not just because of the libraries.
I’d find it very hard to justify a business position where C has any other role than binding and breaking out into something more abstract. Be that Go or C++, for an example.
An argument I used to hear was “performance” from C. I’m not entirely convinced as in a higher language your algorithm may well be better as you can deal with the abstraction.
But… people make money coding C.
That abstraction is leaky, but the alternative is VLIW machines - even in that case, you probably end up using a compiler so that you don't have to worry about parallelism. Reasoning about parallel things is hard, that's why we spend so much time trying to avoid it ¯\_(ツ)_/¯
Yes, I do see Erlang mentioned but I don't think it was considered a solution.
What is irrelevant? To a data-scientist, typescript is low-level. You're required to think about structure and compile stuff!
To a web developer, C# and Java are low-level because you need to think about the execution platform
To an IT developer, C and C++ are low level because you need to think about memory.
To a game developer assembly is low level because you need to think about everything.
To electronocians everything is high level. To accountants VBA in Excel is low level. To a product manager a word document with any sort of technical words is too low level.
If you need to optimize your software to the point where some CPU specific instructions are required, C is too high level because its hiding stuff that is not irrelevant.
The only way that the title is true is one that is not mentioned in the article: when C became popular, anything that was not assembly was a "high level language". Heck, even some Macro assemblers were considered high level, IIRC.
The factors that are mentioned in the article fall roughly into two categories:
1. The machine now works differently.
This may be true, but it does so almost entirely invisibly, and the exact same arguments given in the article apply in the same way not just to assembly language, but even to raw machine language.
I have a hard time seeing how machine level is not low level. But I guess opinions can differ. What seems inarguable is that machine language is the lowest level available. And if the lowest available level does not qualify as "low" in your taxonomy, then maybe you need to rethink your taxonomy.
2. C compilers do crazy shit now
This is also true, but it is true exactly because C is a low level language. As a low-level language, it ties execution semantics to the hardware, resulting in lots of undefined (and implementation defined) behavior that makes a lot of optimisations that some people really, really want to do (but which are far less useful than they claim) really really hard.
So C compiler engineers have defined a new language C' which has semantics that are much more amenable to optimisation. Nowadays they try to infer that language C' from the C source code and then optimize that program. And manhandle the C standard, which is intentionally somewhat loose, in order to make the C'' language that looks like C but maps to C' the official C language.
Since they were moderately successful, it can now be argued that C has morphed or been turned into a language that is no longer low level. However, the shenanigans that were and continue to be necessary to accomplish this make it pretty obvious that it is not the case that this "is" C.
Because, once again, those shenanigans were only necessary because C is a low level language that isn't really suited to these kinds of optimisations. Oh, and of course the rationale document(s) for the original ANSI C standard, which explicitly state that C should be suitable as a "portable assembly language".
But then again we already established that assembly is no longer a low level language...so whatever.