G1 Garbage Collection JVM7 - Big Performance Problems Shown (opens in new tab)

(nerds-central.blogspot.com)

48 pointsNerdsCentral14y ago27 comments

27 comments

16 comments · 7 top-level

sehugg14y ago· 3 in thread

A microbenchmark like this is not very conclusive. The G1 collector is designed to handle the heap fragmentation problem and thus reduce the maximum pause time due to stop-the-world GC. It's not designed for maximum throughput.

Each GC algorithm has its worst-case behavior; saying "avoid at all costs" because of a single scenario is not very helpful.

Also, what were the JVM flags for each test? All I can see in these graphs is the bump at the beginning before the heap has sized to a stable level.

NerdsCentralOP14y ago

Oh - also - whilst I get that the G1 is not designed for maximum throughput - note that it has longer maximum pause as well. I did mention this in the post. So, for this situation, it fails in both regards. Now, whilst this is a benchmark and not a real world system, it is cause for some concern that the G1 collector is slowing down code execution so very much.

NerdsCentralOP14y ago

I hope that the article does not imply that it is conclusive but rather that it is cause for concern. The flags were only those required to set the garbage collector - everything else is default. If you want to try other settings, the code is there to use :)

puredanger14y ago

I don't know why I would ever judge the performance of non-tuned GC for any collector especially when the app is one designed to make life hard for the collector? There is no useful conclusion you can draw from this test.

1 more reply

fleitz14y ago· 2 in thread

"a realtime system must have a deterministic time to complete a task".

Incorrect, a realtime system must only have execution times below a certain threshold. malloc/free is just as non-deterministic as GC allocated memory (in practical systems). If you put effort into it you can make malloc/free do really stupid things too. If I wrote a program that allocated and freed the right size strings the program would eventually crash.

With any sort of modern CPU it would be almost impossible to produce non-trivial programs with deterministic execution times. I'd hazard a guess that there are very few 'realtime' programs that couldn't be written using the JVM. If you can do HFT on the JVM it's realtime enough.

To make systems with deterministic execution times you'd have to start looking at extremely limited processors that lack RAM (or have really exotic ram that doesn't refresh), as well as removing other resources that are essential to building software the modern way.

All this proves is that for this particular problem the JVM is probably unsuitable for a realtime system. The fortunate thing is that I don't think any real time program actually needs to allocate strings in this manner.

In all seriousness most of the reason that the JVM is not used on realtime systems is because there aren't very many cheap CPUs capable of executing java byte code that are certified for operation in the harsh environments that a lot of realtime systems operate in. My friend builds realtime systems and he codes in C because that's the only compiler available for the CPU.

He'd start using Netduino in a heartbeat if it could survive being next to an electric generator inside the Hoover dam.

wglb14y ago

Slightly off topic, but do you have particular information about JVM being used in HFT environment?

fleitz14y ago

http://martinfowler.com/articles/lmax.html?t=1319912579

This is actually a trading platform but if you can run the exchange on the JVM you could certainly write a client for it on the JVM.

Also: http://www.quantfinancejobs.com/jobs/java.asp

1 more reply

wglb14y ago· 2 in thread

Having done some hard-core real-time programming in the past, and currently thinking about a low-latency benchmark between languages, I am not sure that this sort of test is useful in a real-time environment. Also, using a garbage-collected language in a real-time environment would require some extra care.

One of the things to consider in engineering a real-time environment is what is the central data-flow load of the system. For example, in a medical data-acquisition environment, allocating an object for each tick would not likely be wise. In fact, use of malloc might not be called for, statically allocating buffers being a better avenue.

In the case of a low-latency financial trading system written in a garbage-collected language, it might be prudent to allocate all the objects before 0830 (if that is when trading starts) and free them after the market close.

There is a wide range of real-time response requirements (for financial exchanges, response times of less than a hundred of milliseconds (eg, 395 from BATS) for full turnaround are expected). Electrocardiogram data needs to be sampled once each millisecond, but the jitter in sampling times needs to be low.

Looks like some good tools there, but without the engineering context it is hard to draw a useful conclusion.

NerdsCentralOP14y ago

Totally agree. My original aim was to use an idea I had about the way the garbage collector synchronises in the Oracle JVM to give 'windows' of determinism. So I set about making a program to push the gc really hard so I could see if my idea worked. But I have not gotten as far as the original aim yet because I found this interesting stuff about the G1 collector.

As for your points about realtime - I am completely with you. I had in my mind that one might have a realtime system on the rtjvm or in c++ which needed to periodically communicate with a standard JVM. There are three approaches I could see for this. One, would be to send messages and decouple that way. Another is to make the communication abort if it looked like it might cause a deadline miss. The third was to briefly (milliseconds) disable the gc in the standard JVM so the communication could occur and then turn it back on again immediately afterwards.

Now - that sounds like a really bad idea and it probably is - but it is an interesting idea to play with.

So - please don't thing I am proposing realtime programming on the standard JVM, just some ideas for larger systems integration.

wglb14y ago

So my prime hobby is ham radio, and my prime mode is morse code. These days, we mostly connect computers to the radio to do logging and morse code keying (done through the LPT port). One popular program runs under windows. The big boys use another program that runs under DOS, and lately FreeDOS as a replacement. Running under windows, there is an annoying hesitation once every word or so that is due to whatever is going on in windows.

The solution, provided by the maker of the above-mentioned windows software is an external keying module. With this arrangement, windows sends characters to the hardware brick, which, latency-free, sends out the characters quite nicely. (Amazing how the trained morse-code ear can distinguish ms delays in character starts.)

Personally, I now think that way where a system has a component that is latency rich connected to a component that is latency-free. This of course has implications to the overall solution.

johanbev14y ago· 1 in thread

Your graphs make it very hard to compare the performance of the different collectors. What about putting all of them in the same plot, making that logarithmic, and include cumulative time (on a separate linear axis)?

NerdsCentralOP14y ago

OK - I'll see if I can get time.

CountHackulus14y ago· 1 in thread

The G1 Garbage collector isn't certified for realtime operation. If you want a Java VM that's certified for HARD realtime, you can check out IBM's latest real time GC policy that I believe came out with IBM's Java 7.

NerdsCentralOP14y ago

I know - I think the post makes that clear. I explained the purpose of the whole thing an earlier reply. But Oracle and IBM have realtime implementations of the JVM I believe, though I am more familiar with the Oracle one.

aragozin14y ago

G1 is still under active development, but even with ideal implementation it wouldn't shine best performance

Reason 1. G1 is using SATB write barrier which is more expensive than crad marking barrier other HotSpot's collectors are using

Reason 2. G1 have to use STW pause to move object around. E.g. if your heap is fulled in half we life objects, you have to physically move 1MiB live data in memory to reclaim 1MiB of free space, cleaning sparse regions first will effectively reduce this proportion but G1 is still have to work very hard to reclaim each meg of free space

If you need low pause GC in JVM today. use concurrent mark sweep http://java.dzone.com/articles/how-tame-java-gc-pauses http://aragozin.blogspot.com/2011/07/gc-check-list-for-data-...

dgreensp14y ago

Your test is only one very specific scenario. I'm not a GC expert, but on EtherPad, which was a very complex monolithic JVM app, we had to turn to concurrent GC as the default GC just choked with any settings. If you haven't seen much difference between GC settings in the past, have you actually worked on realistic systems that strain the JVM GC, versus your contrived one?

I don't know the semantics of "real-time" (argued in other comments), but if you are coming from that angle, maybe you are looking for guarantees -- it seems like there ought to be a way to write GC so that it just works no matter what you throw at it, and it seems like G1 is not that.

j / k navigate · click thread line to collapse

27 comments

16 comments · 7 top-level

sehugg14y ago· 3 in thread

Each GC algorithm has its worst-case behavior; saying "avoid at all costs" because of a single scenario is not very helpful.

Also, what were the JVM flags for each test? All I can see in these graphs is the bump at the beginning before the heap has sized to a stable level.

NerdsCentralOP14y ago

puredanger14y ago

1 more reply

fleitz14y ago· 2 in thread

"a realtime system must have a deterministic time to complete a task".

He'd start using Netduino in a heartbeat if it could survive being next to an electric generator inside the Hoover dam.

wglb14y ago

Slightly off topic, but do you have particular information about JVM being used in HFT environment?

fleitz14y ago

http://martinfowler.com/articles/lmax.html?t=1319912579

This is actually a trading platform but if you can run the exchange on the JVM you could certainly write a client for it on the JVM.

Also: http://www.quantfinancejobs.com/jobs/java.asp

1 more reply

wglb14y ago· 2 in thread

Looks like some good tools there, but without the engineering context it is hard to draw a useful conclusion.

NerdsCentralOP14y ago

Now - that sounds like a really bad idea and it probably is - but it is an interesting idea to play with.

So - please don't thing I am proposing realtime programming on the standard JVM, just some ideas for larger systems integration.

wglb14y ago

Personally, I now think that way where a system has a component that is latency rich connected to a component that is latency-free. This of course has implications to the overall solution.

johanbev14y ago· 1 in thread

NerdsCentralOP14y ago

OK - I'll see if I can get time.

CountHackulus14y ago· 1 in thread

NerdsCentralOP14y ago

aragozin14y ago

G1 is still under active development, but even with ideal implementation it wouldn't shine best performance

Reason 1. G1 is using SATB write barrier which is more expensive than crad marking barrier other HotSpot's collectors are using

If you need low pause GC in JVM today. use concurrent mark sweep http://java.dzone.com/articles/how-tame-java-gc-pauses http://aragozin.blogspot.com/2011/07/gc-check-list-for-data-...

dgreensp14y ago

j / k navigate · click thread line to collapse