undefined | Better HN

0 pointspclmulqdq3y ago0 comments

The global optimization step is often what people commonly refer to as "garbage collection." Putting it inside a framework to RC as few times as possible is pretty cool.

However, I doubt the efficacy of your C++ experts: most of the people I know who write C++ are actually really bad at optimizing code. They mostly use it for legacy reasons. If you get a team of experienced (and expensive) systems programmers, you will likely get a slightly better result than your GC algorithm.

0 comments

10 comments · 4 top-level

UncleEntity3y ago· 3 in thread

> However, I doubt the efficacy of your C++ experts: most of the people I know who write C++ are actually really bad at optimizing code.

Which is a very good reason to develop an optimized GC algorithm, the domain experts can crank out code without having to optimize every single memory (de)allocation which sounds like a waste of their time.

It’s funny, people don’t usually doubt that a modern compiler can do a better optimization job than an expert but add a memory management algorithm and that’s a bridge too far.

mbrodersen3y ago

Yep you are absolutely right. The business rules domain experts using the language are way more productive using a pure functional language that takes care of all the nitty gritty performance and memory issues for them. The code they write is 100% focused on solving customer problems.

pclmulqdqOP3y ago

Personally, I think non-GC languages are very much overused. If you are writing business logic, and not counting cycles (even worse - if you are never planning to count cycles), you should probably not use C++ or Rust.

However, a GC is a lot slower than manual memory management, which contrasts with the fact that most compiler activities are actually pretty low in overhead (now - it didn't used to be this way). Really, the only cost overhead left is the abstraction mismatch, and that is not too bad, when you compare to how bad humans are at writing assembly.

That said, this case looks like one where the C++ experts spent very little time optimizing (mostly writing business logic), and probably made a very poor choice of tools.

aidenn03y ago

> However, a GC is a lot slower than manual memory management, which contrasts with the fact that most compiler activities are actually pretty low in overhead (now - it didn't used to be this way). Really, the only cost overhead left is the abstraction mismatch, and that is not too bad, when you compare to how bad humans are at writing assembly.

There is a triangle of GC performance; througput, latency (i.e. pause length), and memory overhead. Manual memory management will often be slower (in the throughput sense) than a throughput-tuned GC because:

1. Manual memory management typically precludes moving live data

2. Manual memory management often frees data as soon as it is dead

GC will often have faster allocations than manual memory management because #1 makes it possible to just use a pointer-increment for allocation. GC will often have faster freeing of data because of #2; in particular using a nursery with Cheney's algorithm makes it O(1) to free an arbitrary amount of data.

Where a throughput optimized GC falls down is in that any code that allocates may have an unpredictable amount of delay.

Also note that for video games, both typical GC and malloc/free are often too slow for per-frame data, so arena allocators are used, which sidestep #2, and allow a pointer-increment allocation without needing #1. This is specifically because there are a lot of objects with exactly the same bounds on their lifetime. Special-purpose algorithms will almost always trump general-purpose algorithms when run on the workload they are optimized for.

dataflow3y ago· 3 in thread

What I understood from their comment (which may not be correct) is the following. Say you have something like this:

  extern void foo(T *p);   // some arbitrary function
  
  void bar1(bool cond)
  {
    ..
    auto p = std::make_unique<T, your_deleter>();
    if (cond) { return foo(p.release()); }
    ...
  }

This requires the compiler to call your_deleter::operator() regardless of whether cond is true or false, even though it's unnecessary (and can thus be slower) in the case where cond is true. Moreover, the obvious way to avoid it is to write it "C-style":

  void bar2(bool cond)
  {
    ..
    auto p = new T();
    if (cond) { return foo(p); }
    ...
    your_deleter()(p);
  }

which can up being faster when cond is true. But this isn't something an expert would generally want to do, as now the C++ code becomes unidiomatic, fragile, and unmaintainable.

In an ideal world, though, you could have an optimizer smart enough to do that transformation automatically. C++ compilers already do that in trivial cases, but they can't do it in general. My impression is that their Haskell compiler exploits the internal knowledge of what your_deleter does (i.e. reference counting) in order to optimize the code in various ways, like optimizing out such code, consolidating refcount updates, etc. And if I understand this correctly, there's no surprise at all that it can be faster than idiomatic C++ code written even by experts.

The question for me isn't the expertise of their programmers. Perhaps in their case they genuinely do need to have lots of objects on the heap, have (say) tight loops where they (for whatever reason) nevertheless cannot avoid the heap allocations, and don't have much of a use for finalizers besides freeing memory. In which case, I'm not surprised their solution clearly delivers better results than the C++ equivalent. The question from me, instead, is how well they think that generalizes, such as to (a) well-written Haskell programs in general, (b) well-written C++ programs in general, and/or (c) other domains. It would be one thing if their solution delivers better results in Haskell than C++ for their use case; it would be another thing if they could claim their solution delivers better results in Haskell than C++ for most use cases.

mbrodersen3y ago

Yep you are quite right. The optimiser basically does what an experienced C++ optimisation experts would (tediously) do by hand: it carefully reduces the number of reference counter modifications to an absolute minimum by symbolically evaluating the full executing flow of the code. It is something that is doable in a pure functional language like this one. I am not sure if it is possible to do the same in C++ (because of the aliasing rules).

aaaaaaaaaaab3y ago

std::unique_ptr::release() resets the ptr to nullptr, which will be the first thing the deleter will check for and bail out.

This is a non-issue.

dataflow3y ago

> which will be the first thing the deleter will check for and bail out.

The point is that check itself is an extra instruction (or two, rather) that would otherwise be skipped entirely.

I'm not saying this commonly makes a difference. I'm just saying this might be something that does make a difference for them in their particular use case.

Also note that I was trying to describe the general phenomenon with a simple example, but this obviously isn't limited to std::unique_ptr.

mbrodersen3y ago

The C++ code I am referring to was hand optimised and maintained for 10+ years before the compiler I am maintaining was fast enough to take over. I never looked at the C++ code myself. I simply use it as the milestone to beat. When I took over the maintenance of the compiler it was 25x slower than the C++ code. It took a lot of work to finally make it faster. I also used the C++ code to compare the output of the compiler and the C++ code. I found quite a few bugs in the C++ code doing that by the way.

mbrodersen3y ago

Just to clarify: it is not a framework. It is a whole program optimisation pass done by the compiler before lowering the intermediate code to LLVM and then generating machine code.

j / k navigate · click thread line to collapse

0 comments

10 comments · 4 top-level

UncleEntity3y ago· 3 in thread

> However, I doubt the efficacy of your C++ experts: most of the people I know who write C++ are actually really bad at optimizing code.

It’s funny, people don’t usually doubt that a modern compiler can do a better optimization job than an expert but add a memory management algorithm and that’s a bridge too far.

mbrodersen3y ago

pclmulqdqOP3y ago

That said, this case looks like one where the C++ experts spent very little time optimizing (mostly writing business logic), and probably made a very poor choice of tools.

aidenn03y ago

1. Manual memory management typically precludes moving live data

2. Manual memory management often frees data as soon as it is dead

Where a throughput optimized GC falls down is in that any code that allocates may have an unpredictable amount of delay.

dataflow3y ago· 3 in thread

What I understood from their comment (which may not be correct) is the following. Say you have something like this:

  extern void foo(T *p);   // some arbitrary function
  
  void bar1(bool cond)
  {
    ..
    auto p = std::make_unique<T, your_deleter>();
    if (cond) { return foo(p.release()); }
    ...
  }

  void bar2(bool cond)
  {
    ..
    auto p = new T();
    if (cond) { return foo(p); }
    ...
    your_deleter()(p);
  }

which can up being faster when cond is true. But this isn't something an expert would generally want to do, as now the C++ code becomes unidiomatic, fragile, and unmaintainable.

mbrodersen3y ago

aaaaaaaaaaab3y ago

std::unique_ptr::release() resets the ptr to nullptr, which will be the first thing the deleter will check for and bail out.

This is a non-issue.

dataflow3y ago

> which will be the first thing the deleter will check for and bail out.

The point is that check itself is an extra instruction (or two, rather) that would otherwise be skipped entirely.

I'm not saying this commonly makes a difference. I'm just saying this might be something that does make a difference for them in their particular use case.

Also note that I was trying to describe the general phenomenon with a simple example, but this obviously isn't limited to std::unique_ptr.

mbrodersen3y ago

Just to clarify: it is not a framework. It is a whole program optimisation pass done by the compiler before lowering the intermediate code to LLVM and then generating machine code.

j / k navigate · click thread line to collapse