undefined | Better HN

0 pointsdarksaints5y ago0 comments

Don't forget scanning. Yes, moving blocks of memory around is expensive, but it can also be done concurrently. Scanning, AFAIK, cannot be done concurrently, and thus remains the primary blocker to lower latency. And scanning is something that is entirely eliminated with static memory management.

0 comments

10 comments · 6 top-level

pron5y ago· 4 in thread

Scanning is most certainly done concurrently with ZGC. Even root scanning is on its way to become fully concurrent, which is why we're nearing the goal of <1ms latency.

jerrinot5y ago

I didn't know about fully concurrent root scanning, thank you!

How does root scanning work wrt to Loom? Are stacks of virtual threads treated as roots? I guess there is no other option?

pron5y ago

No, virtual thread stacks are not roots! This is one of the main design highlights of the current Loom implementation. In fact, at least currently, the VM doesn't maintain any list of virtual threads at all. They are just Java objects, but the GC does treat them specially.

1 more reply

pkolaczk5y ago

Right, it is concurrent, but it is still costly. It brings rarely used data into the caches and pushes useful data out of the caches. If some parts of the heap were swapped out, the impact of concurrent scanning can be quite dramatic.

pron5y ago

Ah, but you can pin GC threads to specific cores, and a reference-counting GC also has such non-trivial costs. In practice, however, people in the 90-95% "mainstream" domain that Java targets are very happy with the results. Of course, there are some applications that must incur the costs of not having a GC. In general, though, the main tangible cost of a GC today, for a huge portion of large-scale applications, is neither throughput nor latency by RAM overhead.

2 more replies

haxen5y ago

All the modern GCs scan the heap concurrently, the hardest problem is scanning the GC roots in the call stack. ZGC is currently implementing concurrent stack scanning.

wokkel5y ago

Yes it can. Stop the world collection is a thing of the past. See for example: https://developers.redhat.com/blog/2019/06/27/shenandoah-gc-...

jerrinot5y ago

Modern Garbage Collectors do concurrent scanning.

I believe most GC implementations have non-concurrent "initial marking" phase, but that's typically fairly quick. It has to scan roots of your object graph, think stack, JNI, etc.

dan-robertson5y ago

Scanning can be done incrementally with each allocation (Such that allocations become slightly more expensive but no individual allocation does loads of scanning work). Scanning can also be done concurrently.

rurban5y ago

Scanning is also entirely eliminated by using no global heap allocations. With copying and small stack allocations there's not need to scan much, and can easily stay below 1ms.

j / k navigate · click thread line to collapse

0 comments

10 comments · 6 top-level

pron5y ago· 4 in thread

Scanning is most certainly done concurrently with ZGC. Even root scanning is on its way to become fully concurrent, which is why we're nearing the goal of <1ms latency.

jerrinot5y ago

I didn't know about fully concurrent root scanning, thank you!

How does root scanning work wrt to Loom? Are stacks of virtual threads treated as roots? I guess there is no other option?

pron5y ago

1 more reply

pkolaczk5y ago

pron5y ago

2 more replies

haxen5y ago

All the modern GCs scan the heap concurrently, the hardest problem is scanning the GC roots in the call stack. ZGC is currently implementing concurrent stack scanning.

wokkel5y ago

Yes it can. Stop the world collection is a thing of the past. See for example: https://developers.redhat.com/blog/2019/06/27/shenandoah-gc-...

jerrinot5y ago

Modern Garbage Collectors do concurrent scanning.

I believe most GC implementations have non-concurrent "initial marking" phase, but that's typically fairly quick. It has to scan roots of your object graph, think stack, JNI, etc.

dan-robertson5y ago

rurban5y ago

Scanning is also entirely eliminated by using no global heap allocations. With copying and small stack allocations there's not need to scan much, and can easily stay below 1ms.

j / k navigate · click thread line to collapse