undefined | Better HN

0 pointszozbot2344y ago0 comments

> Another issue is that some HPC silicon has unusual low-level concurrency and memory semantics with no analogues in ordinary CPU architectures

As long as it's properly reflected in the LLVM IR, I don't think this would be an issue.

0 comments

3 comments · 1 top-level

jandrewrogers4y ago· 2 in thread

That does not address the issue I was raising.

In some systems, sophisticated memory ownership and concurrency control are implemented in the hardware. These abstractions are very low-level and are intended to transparently support fine-grained and extremely high concurrency with almost no overhead.

The memory ownership model is oblivious to the programming language's concept of such a thing. As long as the compiler is emitting the appropriate memory control instructions, there can never be a conflict no matter how many concurrent mutable references there are and this conflict resolution is nearly free. This scales to thousands of cores and millions of concurrent threads.

The issue with porting Rust to this silicon, without thinking about it too deeply, seems to be two-fold:

First, code organized to satisfy the borrow-checker etc is an anti-optimization on these CPUs because they are specifically designed to encourage you to ignore memory ownership and mutability at the code level. This enables absurdly concurrent code execution on mutable shared memory that would be difficult-to-impossible to write safely on an ordinary CPU in any language.

Second, you would probably want the Rust compiler to ignore many issues of ownership and concurrency, instead emitting appropriate memory control instructions and delegating it to the hardware, since the hardware does it much better. This seems non-trivial, particularly since the hardware model does not map perfectly to Rust's model in the details (which vary by CPU design). This also creates the case of code that safely compiles on this silicon won't compile on e.g. x86 because the safety is provided by the hardware rather than at compile-time.

Ironically, a large part of the reason these CPUs have never been commercially successful is that people don't know how to write idiomatic massively concurrent mutable code. The assumption that thread concurrency is expensive and dangerous is so pervasive that everyone writes code that tacitly assumes this even when it isn't true for the given hardware, with significant loss of performance. Idiomatic algorithms and data structures in C++ on these platforms look very different than what you would use on e.g. x86 or ARM.

SubjectToChange4y ago

Whatever system you are talking about doesn't exist, not in the way you described at least.

jandrewrogers4y ago

The prototype for architectures with these properties were the Tera MTA supercomputers. These were evolved by Cray but a few other companies (including Intel) experimented with and produced their own variants of the concept. It was a great model once you grokked it.

The general idea is that every memory address has several semantic bits that annotate the contents on load and store. Each core has multiple independent hardware threads (128 in the case of MTA) which adapt their scheduling to those annotations on a clock cycle by clock cycle basis. You can design massively multithreaded code for these platforms with almost perfect scalability that would have catastrophically high contention and overhead anywhere else, which was the point.

There are quirks to designing software for these systems, but they don’t involve safety.

1 more reply

j / k navigate · click thread line to collapse