Leveraging Rust in our Java database (opens in new tab)

(questdb.io)

151 pointsjerrinot2y ago88 comments

88 comments

53 comments · 9 top-level

amunra__2y ago· 13 in thread

I'm the author of the blog post.

The focus of the article really is about JNI in Rust.

I see most questions are about "Why did you not use X language instead?", so let me try and address this.

To answer the "Why not just Rust", I should first mention that Rust was still in its early days (before 1.0), and it was a risky bet to choose an emerging language.

The project was started by Vlad (our CEO) who had a background writing high performance Java in the trading space. The Zero-GC techniques - whilst uncommon in open source software - are mature and a staple of writing high performance code in the financial industry. The product evolved organically, feature after feature.

I personally joined the team from a C and C++ background, having previously moved from a project that suffered from minute-long compile times from single .cpp files due to template overuse. Whilst I do miss how expressive high-level C++ can be, Java has really good tooling support.

When writing systems-style software most of what matters in terms of performance is how we call system calls, manage memory and debug and profile. This is an area where Java really shines. Don't get me wrong: In the absolute sense I think C++ tools tend to be better (Linux Perf is awesome!), but Java tooling is _there_. IntelliJ makes it trivially easy to run a test under the debugger reliably and consistently. It's equally easy to run a profiler and to get code coverage. The same tools work across all platforms too, might I add. It's not necessarily better, but it's easier. Turns out that while a little quaint, using Java turned out to be a pretty good choice in my opinion in practice.

Times have moved on. The Rust community really cares about tooling, and it's one of the reasons why we've picked it over expanding our existing C++ codebase: We just want to get stuff done and have enough time left in our dev cycle to properly debug and profile our code.

belter2y ago

Thanks for the interesting post. Do you plan to maybe use in the future JEP 442? https://openjdk.org/jeps/442

amunra__2y ago

When the time is right. There's finally new APIs coming in the Java space that will make native-code interop easier and more reliable.

Our open source database edition can also be used embedded though, so we can only upgrade at the pace of our customers and because of that we still are compatible all the way down to Java 8.

Were it not for this detail, we'd probably consider it a lot sooner.

HdS842y ago

Do you know ravendb? Document db almost entirely written in c#. It's incredible how fast it can get, but as you said it does not look like typical lob code at all

goostavos2y ago

Do you have any background reading for high performance java or how it's used in the finance world? I had no idea it was used in this niche. As a crufty Java dev who types `new` everywhere and never gives thought to GC, squeaking out high performance sounds like an interesting side of the language.

amunra__2y ago

Maybe we should write a blog post, though there's gotta be one out there for this already.

The short of it is: Learn C. Learn your system calls. Learn JNI. Learn about com.sun.misc.Unsafe. Learn about the disruptor pattern. Learn how to pool objects. The long type can pack a lot of data. Go from there!

Yeroc2y ago

Check out Peter Lawrey's blog [1] has lots of excellent content coming from high performance trading background. Also some older (probably dated) content on Martin Thompson's Mechanical Sympathy [2] blog.

[1] http://blog.vanillajava.blog/

[2] https://mechanical-sympathy.blogspot.com/

lenkite2y ago

Why did you not consider leveraging Java's recent Foreign Function and Memory API ?

bluestreak2y ago

One of our distribution channels is Maven Central where we ship Java 11 compatible library. Embedded users preclude us from leveraging latest Java features.

Yeroc2y ago

FYI, even in the most recent Java (21) LTS release this is flagged as an "early access" feature so you're unlikely to see production applications using it yet.

ahoka2y ago

So Rust combines the expressiveness of C++ with the ease of development of Java?

amunra__2y ago

I'd say expressiveness of C++ with productivity of Java. Rust is indeed not easy to learn.

A little example. Yesterday I was making changes to our C++ client library and I wanted to improve the example in our documentation.

We use a dedicated protocol called ILP for streaming data ingestion and each of the inserted rows has a designated timestamp.

In the Rust example, I using added support for chrono::DateTime and it was trivially easy for me to add a timestamp for a specific example date and time: Utc.with_ymd_and_hms(1997, 7, 4, 4, 56, 55).

Our C++ library instead takes an std::chrono::time_point. I wanted to use the same datetime. As far as I can tell it requires first going through the old C "struct tm" type (which is local and not UTC), then converting to "time_t" then converting to utc via gmtime and then constructing a time_point from that. After 10 minutes the code got too long and complicated so I just substituted it a timestamp specified as an int64_t in nanoseconds.

Don't get me wrong, the C++ time_point is a work of art in how flexible it is, but unnecessarily complicated in most cases.

I should add that I also spent 45 minutes yesterday debugging a CMake issue.

Rust is not easy to learn, but it's just more modern and productive.

C++ is still great if you've got a massive team, but at our scale I don't think it makes any sense.

jandrewrogers2y ago

Rust development is Java-esque in some ways, I think that is a fair characterization, but the Rust language is noticeably less expressive than C++. The relative lack of expressiveness has been a stumbling block for use in some domains, because modern C++ implementations require a fraction of the code to do the same thing. This disparity doesn't show up for all types of code, so it is not uncommon to see both Rust and C++ used in the same org depending on what the code is trying to do. They have different strengths.

2 more replies

kaba02y ago

I wouldn’t call Rust easy to develop in.

3 more replies

wmiel2y ago· 12 in thread

I'm not sure I understand why they're using java if they avoid GC? Doesn't sound like the best fit, especially the foreign memory in java isn't too pleasant to work with.

Joe8Bit2y ago

"Hard to write but easy for customers to deploy" is my guess. There are a bunch of very high-performance computing use-cases in finance (quant, HFT) that Java gets used for pretty routinely. That's a very attractive market to build primitives like databases for, they have very deep pockets, but you need to play in their ecosystem.

I've seen this "GC-less" Java in those use-cases quite a bit. From a conceptual design POV it's likely not the best approach, but there's a lot of sunk cost in that eco-system and a lot of trust and expertise where "Choosing a better language" is often several orders of magnitude more expensive.

cmrdporcupine2y ago

They're not the only ones to have done this in this space. VoltDB (Michael Stonebreaker of Postgres [among other things] fame) did this -- low or no-GC style Java, effectively non-idiomatic Java, but taking advantage of the Java runtime in other ways.

Others have done the same. And as others have pointed out, there's things outside the DB domain in high frequency trading and the like that have done this as well.

There are advantages to Java: mature runtime, large talent pool out there, good tooling (still haven't seen anything as good as JMX for any other runtime). And if there's any language whose GC could be tuned to be "responsible", it'd be the JVM; there's been more GC R&D in the JVM than in any other runtime.

I worked at RelationalAI (another DB vendor) for a bit, and their DB is all written in Julia, another garbage collected language... and the GC in Julia is what I'd characterize as ... immature... for that kind of application. I would have loved to have access to the JVM's GC there.

Also this looks to be more of an analytical, column oriented, database. So I can imagine they're optimizing more for throughput than transactional latency. (I could be wrong, correct me, Quest folks...)

And choice of Java likely has to do with when they began working on the project and what was out there at the time. It's the real world of software eng. We work with the tools and people we have because shipping a product on time and bringing in $$ is more important than anything else. I don't know when they got started, but Rust has only matured to "mainstream" stability/acceptance in the last 2-3 years.

Finally, DBs often have a very layered architecture and theyt could easily compartmentalize pieces such that latency sensitive bits could be done in native Rust. They're not apparently doing this, but I could see them doing things like moving the page buffer or column indices or storage engine over to Rust over time for performance benefits.

All power to them, it's great to see them working with Rust. (aside: my email history looks like I spoke to a recruiter there at some point, maybe, but didn't interview? I think if I'd known they were playing with Rust I would have given that more attention...)

nhourcard2y ago

Also this looks to be more of an analytical, column oriented, database. So I can imagine they're optimizing more for throughput than transactional latency.

Yes that is the case

usrusr2y ago

Seamless integration with parts that don't need to be GC-free comes to mind: they are not building an application, they are building a building block. And that building block can be used both in applications that do require the latency guarantees of GC-free as well as in applications that don't. Another class of applications would be ones that alternate between phases of unpredictable latency (like bootup or reconfiguration) and low-latency operation.

ilikerashers2y ago

I remember reading that the founder was working in low-latency Java development with London investment banks for years. I guess it's what he knew.

Also, Rust is a hard language to start a company with so I wouldn't be surprised if this is more of a product maturity thing.

skippyboxedhero2y ago

Surely it is at least as hard to find people who know how to write Java without GC?

Presumably you can't use Hotspot so you have to write your own VM too?

1 more reply

jabradoodle2y ago

It's not that uncommon of move, to choose jumping through hoops with Java over writing C.

Might become less common now Rust is teaching the level it is.

cmrdporcupine2y ago

Having worked on writing DB internals in both Rust and in other languages, I can say that there's huge time-saving advantages to having something higher-level & garbage collected at the layer of the query parser/analyzer/compiler. The borrowing/ownership semantics can get really snaky when dealing with complicated expression trees, iteration patterns, etc.

It's fairly hard to write ergonomic interfaces for more complicated iteration patterns in Rust while still respecting safety. That's actually fine and by design, and it's possible with a lot of effort and thought but this is not as much of a concern in e.g. Java. E.g. skim the discussion on this proposed "cursor" API for Rust's stdlib BTree: https://github.com/rust-lang/rust/issues/107540

(And while Rust's enum-based algebraic types & pattern matching are nice, they're actually fairly limited when compared to what you can find in e.g. Scala or F#, Haskell, etc.)

But I think there's also huge win in doing something like the pager/buffer pool/storage/data structure/indexes layer in Rust. For safety and efficiency reasons.

MattPalmer10862y ago

Yeah, sounds like a lot of effort.

The only thing they say that explains it is they end up with a single jar file, whose only dependency is the JRE.

So I guess they get platform independence and easy installation.

jerrinotOP2y ago

QuestDB engineer here: We use jlink to create images for selected platforms. This means not even JRE is a dependency: You unpack a tarball and you are good to go. See: https://questdb.io/docs/get-started/binaries/

1 more reply

nu11ptr2y ago

Keep in mind the JNI libs are platform specific. That means available platforms are a function of what the JRE runs on AND they have built the shared lib for (and bundled in to the jar)

_ZeD_2y ago

you know what fatjars are?

pron2y ago· 7 in thread

> We seldom use the new keyword and objects are designed to be pooled and reused.

Just note that depending on the selection of the GC, this kind of usage may make the GC work more than when allocating new objects, not less. In particular, with the newer GCs -- G1 and ZGC -- mutating existing objects may be more costly than allocating new ones depending on circumstances. In general, the new GCs are optimised to work the least and give the best performance when the allocation rate is neither too high nor too low; the new GCs also reuse memory better than object pools. Reusing objects also precludes scalarization optimisations, i.e. not every `new Foo` actually results in a heap allocation, and can be optimised to work directly in registers.

So while on very old JVMs (such as Java 8) a "zero allocation" strategy may result in better performance and in "zero GC", on newer JVMs it may result in worse performance and more GC work (in fact, it will almost surely not yield zero GC). While it depends on many variables, I would advise against a zero allocation strategy on newer JVMs as the default path toward better performance or even better latency. It's an approach that seems to be very strongly coupled to the way the JVM was designed over a decade ago, but a lot has changed since then.

Additionally, Java now offers manual memory management and efficient FFI that are significantly better than what JNI offered: https://openjdk.org/jeps/442

doctorpangloss2y ago

Finance and video games both have really stubborn engineering traditions.

> https://openjdk.org/jeps/442

This succinctly is the answer.

> not every `new Foo` actually results in a heap allocation, and can be optimised to work directly in registers.

While I know in my heart of hearts that you are right, this category of advice has so many caveats. It's like when Graal markets itself as a potential substitute for NodeJs, where its performance is abjectly terrible.

I have never personally succeeded in convincing anyone with these deep programming traditions to try something new that may be Better in Every Way. It always has to be this long, trickle down journey of adoption. People still talk about John Carmack and video game engineering, decades after any of his innovations have been verbatim relevant and decades after an individual could possibly have a hope and prayer of authoring a first person shooter with an audience from scratch. But see, I need to know a lot of stuff to understand that.

latchkey2y ago

> I have never personally succeeded in convincing anyone with these deep programming traditions to try something new that may be Better in Every Way.

This is true of almost anything really, not just programming.

1 more reply

vips7L2y ago

Yeah scalar replacement is very conservative and things that a human can easily tell don’t escape the jit cannot.

This is 2ish years old but has a lot of detail on what escape analysis can and cannot see:

https://gist.github.com/JohnTortugo/c2607821202634a6509ec3c3...

cmrdporcupine2y ago

Yeah, it kinda messes with the "most objects die young" philosophy behind generational collectors.

Here's another (related) caveat with taking the "minimal object creation", pooled memory, and/or using lots of non-GC native heap memory path: now you've got blocks of memory sitting there in Process RSS that the GC either can't do anything about, or (worse) knows nothing about.

These types of runtimes are on the whole designed with the philosophy that they Own All The Things, so it's like an invasive body driving the "immune system" nuts...

Never had this problem in Java (haven't worked in it for 10 years) but at a previous job (in Julia) I worked on a database buffer pool / pager where the memory was explicitly managed/allocated (through syscalls to anonymous mmap) for performance reasons. Those pages belong to the same PID as the broader Julia process, but were not managed by Julia runtime, and so the runtime can run into all sorts of issues with OOM kills or huge pauses as the system allocates aggressively without collecting often, thinking it has plenty of headroom... but doesn't.

I know for a fact the JVM GC is smarter than this, and can be tuned more expertly to manage these type of situations, but it's still a big giant caveat...

coldbrewed2y ago

> In general, the new GCs are optimised to work the least and give the best performance when the allocation rate is neither too high nor too low; the new GCs also reuse memory better than object pools.

This a little bit surprising to me that low object creation can degrade GC performance; what's the failure mode for G1/ZGC in this scenario?

Groxx2y ago

In many cases, because it defeats generational collection. It pushes everything into longer generations because they hang around longer.

Doing more young generation collection is sometimes cheaper in aggregate (more frequent but far smaller and usually much more efficient) than adding more data to the older generations (less frequent and more costly, longer pauses, for object pools it happens on all of it even when none of it is currently used, etc).

But as doctorpangloss said: so many caveats. There's ample evidence that it is both better and worse, it depends on lots of details.

The main thing you can confidently claim is that it is not the majority of code, so most language optimizations will choose to improve straightforward and common stuff at the cost of this niche. Not always, but there is definitely more energy in improving the 90%+ cases and that adds up over time. Squeezing out the last bits of performance requires constant upkeep.

pron2y ago

> what's the failure mode for G1/ZGC in this scenario?

GC barriers (special code that gets triggered on some operations by some GCs, such as when mutating a reference field during a GC cycle -- that's a "write barrier").

Concurrent GCs have special rules for newly allocated objects (which are really just a pointer bump) because no one else has seen them yet; young objects require no GC barriers. But once an object is old, a concurrent GC needs to do some work to learn about references in the object changing. So while allocating a new object is usually a pointer bump (and sometimes not even that when the object is scalarized), mutating an old object triggers a GC slow-path that has to mark some data in a shared data structure (so we're talking memory ordering fences) to make sure that the mutated pointer is not overlooked by the GC.

OpenJDK's new GCs are really, very, very good. The new generational ZGC in JDK 21 (with sub-millisecond worst case pause) is just amazing. But these GCs are optimised for "reasonable" Java code and against "unreasonable" object pooling. Things are different from where they were a decade ago in Java 8.

nu11ptr2y ago· 5 in thread

I mean no disrespect to the authors, but this seems like an extremely painful way to write an application. How common is it to write Java apps this way? It seems like there must have been a better language choice than trying to work against the GC and rewriting parts of the standard lib? And now adding in Rust via JNI? It just feels very painful to me.

jerrinotOP2y ago

QuestDB engineer here:

It's true that our non-idiomatic Java usage denies us some of the benefits typically associated with Java programming. Automatic memory management and the old "Write Once, Run Anywhere" paradigm are difficult to maintain due to our reliance on native libraries and manual memory management.

I see two classes of reasons for choosing Java:

1. Historical: The QuestDB codebase predates Rust. According to Wikipedia, the initial Rust release was in 2015. The oldest commit in the QuestDB repo is from 2014: https://github.com/questdb/questdb/commit/95b8095427c4e2c781... What were the options back in 2014? C++? Too complicated. C? Too low-level. Pretty much anything else? Either too slow or too exotic.

2. Technical: Java, even without GC or WORA, still offers some advantage. 2a: The tooling is robust, especially when compared to C++. This starts with build systems (don't get me started on CMake!), and extends to aspects like observability. Stacktraces in Java are taken for granted. What's the state of stacktrace walking/printing in C++? I think it boils down to either Boost, C++23, or some other form of black magic. (I might be wrong here tho) 2b: It's a simpler language, especially when compared to C++ or even Rust. This makes it easier to hire people and also attracts external contributors: https://github.com/questdb/questdb/graphs/contributors 2c: The HotSpot JIT still provides us with solid peak performance without having to mess with PGO, etc. 2d: Concurrency is easier with Java's managed memory, eliminating the need for hazard pointers and the like.

grandinj2y ago

What do you use as a build system for Java? We use ant (old, ugly, reliable) and gradle (new, nice looking and absolutely horrible to maintain)

1 more reply

giancarlostoro2y ago

Looks like they have an interesting range of customers[0] so my take on "why even add the Rust" is because their customers are already using Java, a total rewrite might be considered irresponsible as it would be incompatible with their existing customer base. I do have to wonder though, if some serious Java refactoring in any way, would have helped at all. How many code smells do they have going on in their codebase? Or is it to the best of their knowledge a really clean Java codebase?

Note Discord themselves has used Rust for bottlenecks with Erlang/Elixir/Beam[1].

[0]: https://questdb.io/customers/

[1]: https://discord.com/blog/using-rust-to-scale-elixir-for-11-m...

nu11ptr2y ago

I'm not suggesting they rewrite. I'm questioning if Java was ever the best choice for this application in the first place. They are using Java as if it were C++. Perhaps it would have been better to write it in C++ in that case? I'm not drawing conclusion, as I'm not in their domain and have not given this a lot of thought, but it just strikes me as a particularly shaky foundation.

3 more replies

mrweasel2y ago

While they do present a reasonable explanation for needing to use a different language I do agree it seem painful and the type of decision that will lead to a follow-up article in a few year: "Why we're ripping out Rust".

In some sense it reads a lot like they just needed an excuse to use Rust. Not that Rust is a bad choice for the areas they list as possible candidates for non-java code, it's just a really odd choice for Java shop, but then again, so is fighting the GC.

It is incredible fascinating work though.

exabrial2y ago· 3 in thread

I've never heard of QuestDB until this post, but I very much like what you guys are doing.

InfluxDB 1.x had a chance to be great: the wrote a time series database that used a SQL-like dialect, they had a really nice alerting platform, they had a really nice visiualization tool. Then they abandoned all of that to jump on the hype train of "Write a new programming language!" which was the hot thing like 5 or 6 hype cycles ago. We've _never_ upgraded to their 2.x product because it literally threw away our investment.

I think if you guys get pick up where they departed you'll be tremendously successful.

nhourcard2y ago

thanks for the kind words! We want to stick with SQL - having done a few extensions to make it easier to work with time series data such as SAMPLE BY, LATEST ON, etc. Window functions that the product has been lacking for some time are next to bridge the gap vs other more mature platforms while offering something very new and unique on the performance side, especially ingestion related.

jurgenkesker2y ago

How are you feeling about the upcoming InfluxDB 3? They moved to Rust and support InfluxQL again.

exabrial2y ago

I'll have to take a look. If Kapacitor is back, then I'm in.

crustycoder2y ago· 2 in thread

Nowadays there's a much better technology than JNI for doing this sort of thing in Java.

https://docs.oracle.com/en/graalvm/jdk/21/docs/reference-man...

karussell2y ago

Native image is unrelated to this topic here (hence the downvotes, I guess).

Still GraalVM could be an interesting solution as Rust seems to be supported: https://www.graalvm.org/latest/reference-manual/llvm/Compili...

crustycoder2y ago

https://github.com/oracle/graal/blob/master/substratevm/src/...

And in any case:

JEP draft: Prepare to Restrict The Use of JNI https://openjdk.org/jeps/8307341

JEP 442: Foreign Function & Memory API (Third Preview) https://openjdk.org/jeps/442

klauserc2y ago· 1 in thread

Very interesting to see how the Rust-JNI interface gets used in a production environment (e.g., the topic of unifying logging typically doesn't come up in "your first Rust-JNI app" tutorials, for instance)

I do have one question around the assignment to `static mut CALL_STATE`. Don't you need some form of synchronization/memory fence/memory barrier to make sure that other threads see that assignment?

On x86/x64 it probably doesn't matter (total store order), but other architectures are less lenient.

amunra__2y ago

We have an initialisation step as soon as we load the jni lib that takes care of this. Given that this gets done before any other threads are started, I don't think there'd be an issue. Good point :-)

LispSporks222y ago· 1 in thread

I needed to call Rust from Java a few years ago. The approach was build a straightforward .so and call into it via JNA. It seemed way less complicated

amunra__2y ago

The rust-maven-plugin we wrote indeed also supports JNA for these simple use cases.

https://github.com/questdb/rust-maven-plugin

Compared with JNA, JNI is indeed more complex, but it's faster and has more features. It also solves the problem of calling Java from Rust.

exabrial2y ago

> JNI

Have you guys benchmarked FFI in Java 21 (preview, now release) yet? :) Yes I know it's super new, but I'm curious if there is a benefit in terms of:

1. performance

2. ease of maintenance

3. ease of finding production problems

j / k navigate · click thread line to collapse