Sub-10 ms Latency in Java: Concurrent GC with Green Threads (opens in new tab)

(jet-start.sh)

154 pointshaxen5y ago191 comments

191 comments

47 comments · 7 top-level

jakewins5y ago· 22 in thread

For years of my life, all I thought about was stuff like this. If you've ever ran latency sensitive systems on the JVM.. man is it ever a pain.

Who was it that turned GC off entirely, minimized allocation and just restarted their VMs when they ran out of RAM every couple of hours, was that Netflix?

Either way. It makes me excited for Rust and the languages it'll inspire, all this labor gone away.

manuelabeledo5y ago

> Who was it that turned GC off entirely, minimized allocation and just restarted their VMs when they ran out of RAM every couple of hours, was that Netflix?

Every single financial firm out there, using Java for sub-microsecond tasks. Really, there is no other way to keep low latencies if you have your GC messing around every few milliseconds.

This may surprise some people, but Java is ubiquitous in low latency environments such as trading firms. It offers performance close enough to C++, but the developer pool is way larger. Also, when one needs to comply with extremely low latency requirements, the way to go is always dedicated hardware anyway.

If you are interested in the topic, there is this project called OpenHFT that aims to provide high frequency trading tools for Java. Particularly, their Chronicle Queue implementation tries to handle the GC latency issue by storing stuff off heap. Its co-founder, Peter Lawrey, has also delivered a handful of good talks about low latency Java.

vgatherps5y ago

As someone in the industry I always find the claim that Java is ubiquitous in low latency trading fairly strange.

The only firm that widely uses Java over C or C++ on the hft fast path is Virtu, and the sense I get from some of my friends there is that they regret the decision (and now it’s mostly used to guide fpgas).

There is a TON of Java usage in software that would have been considered HFT maybe 10 years ago, especially at big banks, but the fastest I’ve ever seen somebody make a Java trading program that actually handled the whole tick-to-trade path is 7us, which would be at best ok by C or C++ trading platform standards.

I’m sure you could write some dumb trigger program operating on off-heap bytes that had similar performance characteristics to a C hft program, but unless you accomplish this via code generation then you’re just writing C in Java with a runtime actively battling your goals.

Edit: Jane street uses ocaml + fpgas but they aren’t really in the HFT business in the same way that say Virtu or Tower is.

2 more replies

dan-robertson5y ago

The part of GC which causes the most latency issues is compaction rather than merely collection. Using a language like rust won’t help if you have memory fragmentation and indeed allocation tends to be much faster with a GC than with malloc. I think the advantages of rust are more to do with often avoiding heap allocation entirely (and predictably) or value semantics leading to fewer small allocations or the language’s semantics not forcing you to choose between more reliable allocation-heavy immutable-everything code and faster harder-to-think-about mutation-heavy code.

gwbas1c5y ago

IMO, the main advantage of Rust is that it doesn't require an extensive runtime in order to have memory safety. This allows you to write a library like an image processor or embedded database without using C / C++.

Otherwise, if you wrote your image processor in C# or Java, it becomes hard to call your library from Python or Node because you have to require the entire VM. Likewise, you can ship an application binary that has no requirements for a runtime. (Your application binary doesn't require a JVM, CLR, Mono, Python, Node, or some other runtime.)

I've been through the Rust book twice but I'm just getting to the point of trying to write something in it. The mental model is very different. Coming from C# / Java / Javascript / Objective C; I'm wondering how many hours I need before I can get my head into Rust?

3 more replies

nestorD5y ago

One thing I love about Rust is that allocation are very explicit and easily spotted which helps a lot when one wants to avoid them.

I found C++ to be treacherous around corners cases on this subject.

2 more replies

darksaints5y ago

Don't forget scanning. Yes, moving blocks of memory around is expensive, but it can also be done concurrently. Scanning, AFAIK, cannot be done concurrently, and thus remains the primary blocker to lower latency. And scanning is something that is entirely eliminated with static memory management.

6 more replies

bananaface5y ago

Can't you just allocate a huge block up-front and throw stuff into it with a custom allocator? I don't know if Rust allows you to do that kind of thing.

1 more reply

AtlasBarfed5y ago

So arena allocation and buffer reuse?

pron5y ago

In case you missed this post and the previous ones, here's the news: this labour is gone on the JVM. The GCs in JDK 14-15 are good enough pretty much out of the box, and they're getting better very quickly now.

OTOH, if you think other languages let you do away with a GC without pretty significant extra work, especially in concurrent systems, well, then you haven't had experience with those languages.

tjoff5y ago

That statement has been said for pretty much every release of any GC.

I'd say the opposite is true. Relying on GC requires significant extra work. Because you always need to think about memory (exception: small script-like applications). The only thing a GC does is that it enable you to not think about it, but the moment you don't you will write bad code and realize it was a disservice all along. And by then it is too late.

So in a GC language you need to constantly be aware of when you take something for granted. Which is more work than just doing it manually yourself.

3 more replies

karlmdavis5y ago

I mean... Rust literally doesn't have a GC, in the usual sense -- certainly nothing resembling most GCs' generations, mark, sweep, etc. approaches.

Steve lays it out far better than I could, coining the term "static garbage collection": https://steveklabnik.com/writing/borrow-checking-escape-anal....

And to your "without pretty significant extra work" qualifier: I really don't find that to be true with Rust. The initial learning curve was a bit rough, but certainly far less so than other languages/platforms I've picked up (hello, ML). In the end, I find that it's just a nice, helpful, productive, and ludicrously performant language.

2 more replies

VHRanger5y ago

In D, the GC is guaranteed to only ever run if you allocate new memory on the heap.

This is still a lot of work for a video game (because you never want any latency, the only way to achieve this is with an arena allocator or going full @nogc).

But for apps where the latency requirement is bounded, D doesn't make it hard and the language is nice and ergonomic.

1 more reply

manigandham5y ago

That's the trade-off. Predictability and determinism vs development and testing effort.

The memory work remains the same, you can do it yourself or let the GC handle it. For 99% of applications, the GCs are good enough and getting better every year, but low-latency still needs predictability and would ideally choose manual memory management.

The realities of the job market and IT deployments are different though and that's why we still have JVMs involved with low-latency scenarios because of talent, tooling and productivity.

twic5y ago

They're good enough for some purposes, but not for others. If you need <10 us latency, you still have to jump through hoops to get that on a JVM.

1 more reply

nradov5y ago

You can also use the Real-Time Specification for Java (RTSJ). Essentially it allows you to do all your memory allocation, then shut off the GC.

https://jcp.org/en/jsr/detail?id=282

CyberDildonics5y ago

If something is so latency sensitive and crucial making the java garbage collector such a hindrance, why not start writing parts in C++? It seems to me people end up getting into a situation where they are fighting with the java gc to try to get low latency with huge heaps and constant allocation when it really is not difficult to control memory allocation in modern C++.

nvarsj5y ago

> Who was it that turned GC off entirely, minimized allocation and just restarted their VMs when they ran out of RAM every couple of hours, was that Netflix?

This was common practice in trading firms that got on the Java hype train. Turn off GC and just restart the JVM outside of trading hours.

> Either way. It makes me excited for Rust and the languages it'll inspire, all this labor gone away.

The JVM gives GC a bad name. There are plenty of GC languages which don’t have the level of pain of hotspot except in extreme cases. For the vast majority of GC languages you never even think about it. Rust / C++ are great when you need full control but it’s not necessary for most things.

pron5y ago

No runtime has a better GC than Java's, certainly in the current version (14) which is worlds better than Java 8. The reason you don't think about it in other languages is because if you use other languages, you probably don't care too much about performance in the first place.

3 more replies

darksaints5y ago

I'd say it's more the language design than anything. Heap allocating everything, and then throwing in inheritance 30 levels deep makes for some very poor GC behavior.

1 more reply

def_true_false5y ago

Perhaps Instagram? Or Twitch? Or Discord?

https://news.ycombinator.com/item?id=23144380

lostmyoldone5y ago

I know I read about some trading firm(s) doing that, but I don't know which, if it was even stated.

Whomever they were, they were rotating pre-warmed jvm images with disabled GC, and were reaching quite respectable latency figures.

To not have to recycle them quickly, you'll want to not generate too much garbage objects, and that's actually easier than one might think in Java. Especially if you accept to restart the jvm from time to time, as you only need to be mostly statically allocated.

Rare error paths can freely use dynamic allocation as long as most of the service doesn't.

Nowadays you can also get away without using strings in most places, using only char sequence flyweights over "statically" allocated char sequences. Otherwise strings were a pain, especially API's that really doesn't need a string (ownership) but had string method arguments nevertheless.

Used like that, as you would on an embedded platform, theres nothing I know of that actually beats the JVM in raw performance while still being somewhat practical in terms of tooling and hiring. Rust might take that crown, we'll see, but I hope so.

OneWay432353925y ago

>> It makes me excited for Rust

Here we go again ...

MaxBarraclough5y ago· 5 in thread

Doesn't seem helpful to use the term green threads here. This isn't a JVM with green threads (those are a thing of the past as I understand it). They're using plain old OpenJDK, and they're ensuring the GC gets a CPU core to itself.

Neat that they were able to get a dramatic improvement in GC latencies on both G1 and ZGC.

No mention of the Shenandoah GC. Would the same trick help out there too?

haxenOP5y ago

We did measure on Shenandoah as well, it helped but not enough to be within 10 ms. Since this post is about Hazelcast Jet getting the best latency, we didn't report that.

MaxBarraclough5y ago

Interesting, thanks.

brabel5y ago

> This isn't a JVM with green threads

Exactly... I thought they may be talking about Project Loom's Virtual Threads (which are going to be true green Threads) which are available experimentally as of Java 15, given they did use Java 15, but nothing in the post indicates they used that.

haxenOP5y ago

We use the same technique of cooperative multithreading under any name, but without the low-level support to be able to write plain sequential Java code. However, even though that changes our internal programming model, the behavior with respect to native threads, interactions with the OS scheduler, CPU caches, etc., should be identical.

1 more reply

tyingq5y ago

They say they are using this: https://hazelcast.com/blog/idle-green-threads-in-jet/

And then say it's comparable. "This basic design is also present in the concepts of green threads and coroutines. In Hazelcast Jet we call them tasklets."

ryanthedev5y ago· 5 in thread

Can we talk about no async/await support? Can't build scalable apps when I'm in callback hell. It's like ES5 all over again.

kasperni5y ago

Java is getting virtual threads/fibers instead of async/await.

From https://cr.openjdk.java.net/~rpressler/loom/Loom-Proposal.ht...

-------------

An alternative solution to that of fibers to concurrency's simplicity vs. performance issue is known as async/await, and has been adopted by C# and Node.js, and will likely be adopted by standard JavaScript. Continuations and fibers dominate async/await in the sense that async/await is easily implemented with continuations (in fact, it can be implemented with a weak form of delimited continuations known as stackless continuations, that don't capture an entire call-stack but only the local context of a single subroutine), but not vice-versa.

While implementing async/await is easier than full-blown continuations and fibers, that solution falls far too short of addressing the problem. While async/await makes code simpler and gives it the appearance of normal, sequential code, like asynchronous code it still requires significant changes to existing code, explicit support in libraries, and does not interoperate well with synchronous code. In other words, it does not solve what's known as the "colored function" problem.

ryanthedev5y ago

This is some sauce. Ty.

lostmyoldone5y ago

I have some trouble understanding what people mean by scalable today, especially why people seem to have to run entirely event driven, and not mostly on the socket read/write edges?

Soon to be almost 20 years ago we pushed >10k messages per second on a JVM, using essentially pentium pro class hardware, and the messages spread over thousands of TCP consumers, yielding average latencies well below 0.5 seconds. Not really low latency, but low enough that we didn't need much lower

This was on a purely blocking implementation, because that was before almost anyone did anything like that on the JVM.

With the advancement in async IO, it's got to be possible to drive many millions of sockets, or have really low latency targets before you have to start being really careful?

So what are you guys doing that seems to need so much async code?

With that I mean actual async code, not code having locks but pretending to be async by using callbacks everywhere, because that's somewhat common.

I'm not trying go be rude, I honestly don't get what people are doing that needs more than the JVM should rather easily provide, unless possibly you have super low latency targets?

There seems to be too many that have to resort to quite cumbersome implementation strategies, so I'm starting to think there's some corner of the industry which I have completely missed, and which requires these strategies regularly?

capableweb5y ago

> Can't build scalable apps when I'm in callback hell

What? We've (JavaScript and developers dealing with asynchronous patterns) been able to build scalable (in terms of code and its maintenance) for many many years, probably 10+.

async/wait is simply syntactic sugar and doesn't drastically change anything, you still need to understand the asynchronicity underneath it all, and if you do, you won't have any problems building scalable apps using your knowledge.

ryanthedev5y ago

Pretty sure a callback vs synchronous style of coding is a little more than syntactic sugar.

It's a complete shift in application design...

If I have to flatmap one more time...

1 more reply

tupac_speedrap5y ago· 4 in thread

Glad to see some love for legacy programming languages like Java.

pron5y ago

Well, that's understandable because there are still some legacy companies and organisations left that write a lot of new software in Java, like Apple, Amazon, Netflix, Twitter, Google, Alibaba, Tencent, NASA, GitHub, Microsoft, Facebook, Spotify and nearly all Fortune 500 companies. Plus, if you care about both performance and observability, there aren't many viable alternatives.

BTW, many if not most of the cutting-edge advances in compilation, low-overhead deep profiling, and garbage collection are done on the Java platform, so it's still the technology leader in those areas.

shock5y ago

I don't think Java qualifies as legacy. I view it rather being in the 'mature' stage of programming languages evolution.

pjmlp5y ago

If Java is legacy with 25 years, what to say about C++ with 50, or C with 60, Python and Ruby both around 30.

Ah and the beloved OS around here is reaching 60 as well.

Plenty of legacy love.

dionian5y ago

The Java language is Java's least appealing feature. Fortunately there are seamless alternatives!

hansdieter13375y ago· 2 in thread

I want to see Java on mission critical computers in space!

pjmlp5y ago

Not on space, but still mission critical,

"Aegis Battleship Weapons System"

http://www.artist-embedded.org/docs/Events/2011/JTRES/Slides...

"French radar system for ballistic missile tracking and measurement"

https://www.militaryaerospace.com/defense-executive/article/...

"NASA Ground Control Station for Multiple UAVs Flight Simulation"

https://www.semanticscholar.org/paper/Ground-Control-Station...

Rebelgecko5y ago

I've seen plenty of ground station and mission control software in Java. Actually in space a bit less likely though... A lot of that is running in RTOS environments that Java isn't well suited for (forget garbage collection, some flight software projects go as far as to ban dynamic memory allocation)

ginko5y ago· 2 in thread

I would certainly hope that green threads have less than 10 megaseconds latency.

hnarn5y ago

Was the title changed? Because it says "ms" both here and on jet-start.sh and that unit is milliseconds.

ginko5y ago

It used to be 'Ms'. Guess it's fixed now.

jjav5y ago

Even on ten year old hardware single digit ms latency in Java server apps wasn't very special. Java (JVM) is an extremely performant platform so I always find odd how a meme has somehow built up on it being otherwise.

True that one can end up writing terribly inefficient Java code, but one can write terrible code in any language. If I need to write server code where performance is particularly important and I don't want to deal with the cost (in debug time and dev expertise) of C or C++, Java would be my first choice.

Also I'm of the school of thought that performance always matters. Autoscaling in cloud providers sure makes it easy to scale horizontally to make up for slow server code, but once you reach certain size, go have a chat with the finance team about the AWS bill.

j / k navigate · click thread line to collapse

191 comments

47 comments · 7 top-level

jakewins5y ago· 22 in thread

For years of my life, all I thought about was stuff like this. If you've ever ran latency sensitive systems on the JVM.. man is it ever a pain.

Who was it that turned GC off entirely, minimized allocation and just restarted their VMs when they ran out of RAM every couple of hours, was that Netflix?

Either way. It makes me excited for Rust and the languages it'll inspire, all this labor gone away.

manuelabeledo5y ago

> Who was it that turned GC off entirely, minimized allocation and just restarted their VMs when they ran out of RAM every couple of hours, was that Netflix?

Every single financial firm out there, using Java for sub-microsecond tasks. Really, there is no other way to keep low latencies if you have your GC messing around every few milliseconds.

vgatherps5y ago

As someone in the industry I always find the claim that Java is ubiquitous in low latency trading fairly strange.

Edit: Jane street uses ocaml + fpgas but they aren’t really in the HFT business in the same way that say Virtu or Tower is.

2 more replies

dan-robertson5y ago

gwbas1c5y ago

3 more replies

nestorD5y ago

One thing I love about Rust is that allocation are very explicit and easily spotted which helps a lot when one wants to avoid them.

I found C++ to be treacherous around corners cases on this subject.

2 more replies

darksaints5y ago

6 more replies

bananaface5y ago

Can't you just allocate a huge block up-front and throw stuff into it with a custom allocator? I don't know if Rust allows you to do that kind of thing.

1 more reply

AtlasBarfed5y ago

So arena allocation and buffer reuse?

pron5y ago

OTOH, if you think other languages let you do away with a GC without pretty significant extra work, especially in concurrent systems, well, then you haven't had experience with those languages.

tjoff5y ago

That statement has been said for pretty much every release of any GC.

So in a GC language you need to constantly be aware of when you take something for granted. Which is more work than just doing it manually yourself.

3 more replies

karlmdavis5y ago

I mean... Rust literally doesn't have a GC, in the usual sense -- certainly nothing resembling most GCs' generations, mark, sweep, etc. approaches.

Steve lays it out far better than I could, coining the term "static garbage collection": https://steveklabnik.com/writing/borrow-checking-escape-anal....

2 more replies

VHRanger5y ago

In D, the GC is guaranteed to only ever run if you allocate new memory on the heap.

This is still a lot of work for a video game (because you never want any latency, the only way to achieve this is with an arena allocator or going full @nogc).

But for apps where the latency requirement is bounded, D doesn't make it hard and the language is nice and ergonomic.

1 more reply

manigandham5y ago

That's the trade-off. Predictability and determinism vs development and testing effort.

The realities of the job market and IT deployments are different though and that's why we still have JVMs involved with low-latency scenarios because of talent, tooling and productivity.

twic5y ago

They're good enough for some purposes, but not for others. If you need <10 us latency, you still have to jump through hoops to get that on a JVM.

1 more reply

nradov5y ago

You can also use the Real-Time Specification for Java (RTSJ). Essentially it allows you to do all your memory allocation, then shut off the GC.

https://jcp.org/en/jsr/detail?id=282

CyberDildonics5y ago

nvarsj5y ago

> Who was it that turned GC off entirely, minimized allocation and just restarted their VMs when they ran out of RAM every couple of hours, was that Netflix?

This was common practice in trading firms that got on the Java hype train. Turn off GC and just restart the JVM outside of trading hours.

> Either way. It makes me excited for Rust and the languages it'll inspire, all this labor gone away.

pron5y ago

3 more replies

darksaints5y ago

I'd say it's more the language design than anything. Heap allocating everything, and then throwing in inheritance 30 levels deep makes for some very poor GC behavior.

1 more reply

def_true_false5y ago

Perhaps Instagram? Or Twitch? Or Discord?

https://news.ycombinator.com/item?id=23144380

lostmyoldone5y ago

I know I read about some trading firm(s) doing that, but I don't know which, if it was even stated.

Whomever they were, they were rotating pre-warmed jvm images with disabled GC, and were reaching quite respectable latency figures.

Rare error paths can freely use dynamic allocation as long as most of the service doesn't.

OneWay432353925y ago

>> It makes me excited for Rust

Here we go again ...

MaxBarraclough5y ago· 5 in thread

Neat that they were able to get a dramatic improvement in GC latencies on both G1 and ZGC.

No mention of the Shenandoah GC. Would the same trick help out there too?

haxenOP5y ago

We did measure on Shenandoah as well, it helped but not enough to be within 10 ms. Since this post is about Hazelcast Jet getting the best latency, we didn't report that.

MaxBarraclough5y ago

Interesting, thanks.

brabel5y ago

> This isn't a JVM with green threads

haxenOP5y ago

1 more reply

tyingq5y ago

They say they are using this: https://hazelcast.com/blog/idle-green-threads-in-jet/

And then say it's comparable. "This basic design is also present in the concepts of green threads and coroutines. In Hazelcast Jet we call them tasklets."

ryanthedev5y ago· 5 in thread

Can we talk about no async/await support? Can't build scalable apps when I'm in callback hell. It's like ES5 all over again.

kasperni5y ago

Java is getting virtual threads/fibers instead of async/await.

From https://cr.openjdk.java.net/~rpressler/loom/Loom-Proposal.ht...

-------------

ryanthedev5y ago

This is some sauce. Ty.

lostmyoldone5y ago

I have some trouble understanding what people mean by scalable today, especially why people seem to have to run entirely event driven, and not mostly on the socket read/write edges?

This was on a purely blocking implementation, because that was before almost anyone did anything like that on the JVM.

With the advancement in async IO, it's got to be possible to drive many millions of sockets, or have really low latency targets before you have to start being really careful?

So what are you guys doing that seems to need so much async code?

With that I mean actual async code, not code having locks but pretending to be async by using callbacks everywhere, because that's somewhat common.

I'm not trying go be rude, I honestly don't get what people are doing that needs more than the JVM should rather easily provide, unless possibly you have super low latency targets?

capableweb5y ago

> Can't build scalable apps when I'm in callback hell

What? We've (JavaScript and developers dealing with asynchronous patterns) been able to build scalable (in terms of code and its maintenance) for many many years, probably 10+.

ryanthedev5y ago

Pretty sure a callback vs synchronous style of coding is a little more than syntactic sugar.

It's a complete shift in application design...

If I have to flatmap one more time...

1 more reply

tupac_speedrap5y ago· 4 in thread

Glad to see some love for legacy programming languages like Java.

pron5y ago

shock5y ago

I don't think Java qualifies as legacy. I view it rather being in the 'mature' stage of programming languages evolution.

pjmlp5y ago

If Java is legacy with 25 years, what to say about C++ with 50, or C with 60, Python and Ruby both around 30.

Ah and the beloved OS around here is reaching 60 as well.

Plenty of legacy love.

dionian5y ago

The Java language is Java's least appealing feature. Fortunately there are seamless alternatives!

hansdieter13375y ago· 2 in thread

I want to see Java on mission critical computers in space!

pjmlp5y ago

Not on space, but still mission critical,

"Aegis Battleship Weapons System"

http://www.artist-embedded.org/docs/Events/2011/JTRES/Slides...

"French radar system for ballistic missile tracking and measurement"

https://www.militaryaerospace.com/defense-executive/article/...

"NASA Ground Control Station for Multiple UAVs Flight Simulation"

https://www.semanticscholar.org/paper/Ground-Control-Station...

Rebelgecko5y ago

ginko5y ago· 2 in thread

I would certainly hope that green threads have less than 10 megaseconds latency.

hnarn5y ago

Was the title changed? Because it says "ms" both here and on jet-start.sh and that unit is milliseconds.

ginko5y ago

It used to be 'Ms'. Guess it's fixed now.

jjav5y ago

j / k navigate · click thread line to collapse