undefined | Better HN

0 pointsvitalyd11y ago0 comments

GC definitely makes some types of systems easier to develop and reason about, but it also tends to make people be a bit sloppy with respect to memory management. For systems where managing data is one of their reasons for existence, it creates a negative effect. GC'd runtimes also bloat memory consumption to support GC, as well as perform additional "hidden" operations in support of GC (e.g. card marking), even if you manage to not trigger GCs. In addition, GC works well when the generational hypothesis holds. For data intensive applications with mutation/churn, that doesn't tend to be the case.

I disagree that avoiding GC is easier; you're swimming upstream the entire time. Managing memory manually in GC environments is annoying because the platform wasn't designed for that.

As Frank said, you want to avoid sharing mutable data irrespective of GC or not - that won't scale well with larger core counts. Instead, you want to delineate ownership so you have single writer and many readers, and share the data that way.

0 comments

3 comments · 1 top-level

pron11y ago· 2 in thread

The key is that most of the effort is not where most of the data is. It's the small percentage of data (say, database index) that requires the biggest development effort, and you want the best mechanism to assist not where most bytes are stored, but where the most effort is invested.

> For systems where managing data is one of their reasons for existence, it creates a negative effect.

I think that people who author such systems/libraries know when to rely on the GC and when not to.

> GC'd runtimes also bloat memory consumption to support GC

When you write, say, an in-memory OLTP database (that's the use-case I know best), you keep the index -- that's where concurrency is managed -- on the heap and the user data off-heap. User data is the bulk of the RAM, but you only pay the GC footprint-overhead for the index.

> you're swimming upstream the entire time.

Going off-heap for large, known-lifespan data sets in Java has been widespread -- idiomatic, almost -- for at least five years now, if not ten. There are so many libraries that do that, that it's very, very simple. Again -- that's just where most data lies, not where the most effort is required. You end up with a very small percentage of your code that is not quite "beginner's Java" -- that's not swimming against the current; that's hitting a few strokes for one tiny section of the journey. It's managing the transactions and update to the indexes that requires 95% of the code, and where you end up thanking your lucky stars for a good GC.

I haven't measured exactly, but the entire off-heap code in our database is under 3%, even though over 85-95% of the data is stored off-heap.

> Instead, you want to delineate ownership so you have single writer and many readers, and share the data that way.

Even single writer/many readers still require a concurrent access mechanism. A GC tremendously helps with this pattern.

Besides, the best way to manage data -- and that's based on how caching mechanisms work at every level -- is single-writer/multiple-readers approximately, namely the single-writer may change identity at any time, but you ensure that it happens rarely. Once you try to make ownership permanent, system-wide locking is required. It's that flexibility that helps with amortizing the costs and making sure you're never hit with the worst case at every transaction[1]. There's some literature touching on that required flexibility, but here's my own take: http://highscalability.com/blog/2012/8/20/the-performance-of... (that flexibility is required to keep the B-tree evenly distributed across nodes -- they can be across the network or even NUMA sockets).

[1]: Changing the owner is the amortization pressure-release valve (all amortized algorithms work on that idea of slowly increasing pressure, that is then released seldom but regularly). It replaces sharding's rebalancing, which usually requires a stop-the-world lock (or most of the world).

vitalydOP11y ago

>The key is that most of the effort is not where most of the data is. It's the small percentage of data (say, database index) that requires the biggest development effort, and you want the best mechanism to assist not where most bytes are stored, but where the most effort is invested.

There's also substantial amount of code around dealing with i/o efficiently (and not churning garbage), both net and disk (if there's any persistence involved in the product).

>I think that people who author such systems/libraries know when to rely on the GC and when not to.

Perhaps there's more awareness now, but there are plenty of JVM based systems where moving off-heap (and/or changing code to ease GC workload) came much later in the dev cycle.

>Going off-heap for large, known-lifespan data sets in Java has been widespread -- idiomatic, almost -- for at least five years now, if not ten. There are so many libraries that do that, that it's very, very simple. Again -- that's just where most data lies, not where the most effort is required. You end up with a very small percentage of your code that is not quite "beginner's Java" -- that's not swimming against the current; that's hitting a few strokes for one tiny section of the journey. It's managing the transactions and update to the indexes that requires 95% of the code, and where you end up thanking your lucky stars for a good GC.

IME, you end up not using a good bulk of JDK and 3rd party libs because they allocate frivolously. You end up having to look at internals of other code to see if they allocate or not since nobody in the java space cares about documenting that aspect (because allocation is mundane). If you then need to tune GC knobs, may the force be with you. On top of that, once you go off-heap, you're doing roughly the same style of block memory management that you'd do natively, except less efficiently (e.g. some Unsafe operations aren't JIT optimized as well, some operations aren't even intrinsics so you get JNI call overhead, you always get zero'd memory if you call Unsafe.allocate() even if you then overwrite everything, etc). You're probably going to now tell me that getting 90% of the performance is fine ...

I suggest you look at some non-managed OSS systems that deal with high-perf workloads. Your posts always make it sound like it's nigh near impossible to build anything of the sort without GC, which is ridiculous. If you want to see what a modern C++ rendition would look like that attempts to avoid sharing mutable data, have a look at https://github.com/cloudius-systems/seastar

pron11y ago

> IME, you end up not using a good bulk of JDK and 3rd party libs because they allocate frivolously.

That's more of your concern because you care about worst-case (very) low-latency domains. Short-lived objects are not a big concern for those who just wish to avoid the large pauses (and can live with the very occasional query taking 40ms instead of the average 50us).

> with i/o efficiently (and not churning garbage), both net and disk

At least one of the most popular Java IO libraries don't create much garbage (Netty 4), and besides, that's not a major concern at all because short-lived etc...

> you're doing roughly the same style of block memory management that you'd do natively, except less efficiently

Only for very simple lifespans. The heap data guards the offheap data wrt concurrency. And so what? All of it still amounts to a tiny, almost negligible, portion of the code, and that portion turns out to be quite simple. I'd rather write a small simple package for stuff that would be taken care of automatically by a non-GC language than write lots of complex, bug-prone code in a non-GC language for stuff that's taken care of automatically in GC runtimes.

> you always get zero'd memory if you call Unsafe.allocate()

But you don't allocate offheap slabs often at all.

> Your posts always make it sound like it's nigh near impossible to build anything of the sort without GC

It's much more work, and requires placing and reasoning about various constraints that may prove problematic. An example later.

> If you want to see what a modern C++ rendition would look like that attempts to avoid sharing mutable data, have a look at https://github.com/cloudius-systems/seastar

Thank you, I will. The example I like to give is that of VoltDB, a "NewSQL" database written in Java and C++ (C++ for the core) which uses internal sharding. Every thread is responsible for a slice of the data. But that database (designed or supervised by none other than Michael Stonebraker) requires locking for any join (even they couldn't have come up with a better way to do it). But their approach displays results that are far from groundbreaking (for modern, in-memory databases), and places a lot of burden on the user to carefully choose a sharding strategy. As another example, MemSQL don't shard, and use a lock-free index (they use concurrent skip-lists). They wrote the code in C++, so obviously that's possible, but put a lot of effort into just implementing the skip-lists (they use hazard pointers), and ended up with results that could have been just as good -- or better -- with a general-purpose GC. BTW, another thing they missed by picking C++, is that in order to make queries faster, they compile SQL to C++ at runtime, compile that C++ and load the resulting code. Java would have given them a much more efficient JIT.

> You're probably going to now tell me that getting 90% of the performance is fine ...

As a general case -- absolutely. In this case, Java gives you more like 96-97% performance for significantly less effort.

EDIT:

I've now read about seastar (and realized I know the people behind it). That approach is the absolute best way to get the worst-case lowest latency possible, if you're willing to pay a significant cost in design (how do you transactionally update data on two shards? -- you can't. Imagine a game object moving from one shard to another -- it will either show up twice or disappear momentarily). But if you're willing to relax the worst-case latency, you can gain significant gains in both throughput and ease of development (and still keep latency low enough for almost all domains). That's certainly how I'd design video-streaming or a network/phone switch -- not so much an online game, IoT sensor-fusion, or a group-chat app.

j / k navigate · click thread line to collapse