Designing a new concurrent data structure (opens in new tab)

(questdb.io)

108 pointsgoodroot2y ago29 comments

29 comments

27 comments · 8 top-level

IndoorPatio2y ago· 8 in thread

In my experience Left-Write is the clear place to start. It's general purpose and fast for reads. Only if that is unsuitable (eg memory usage) does one design a custom data structure. There's a Rust lib implementation with links to more resources:

https://docs.rs/left-right/latest/left_right/

jerrinot2y ago

I only learned about the left-right schema after finishing my design. I thought I came up with something novel - it turns out it was just my ignorance:)

Still, I think the left-right crate could benefit from one optimization I came up with. This is what the lib says in the docs:

> When WriteHandle::publish is called, the writer, atomically swaps the reader pointer to point to the other T. It then waits for the epochs of all current readers to change, and then replays the operational log to bring the stale copy up to date.

I think the lib could postpone the log replaying until the next publish. Chances are all readers will have the new epoch by than = the writer thread won't have to wait at all.

the_svd_doctor2y ago

>> I thought I came up with something novel - it turns out it was just my ignorance:)

The first 3 years of my PhD summarized in one sentence.

dikei2y ago

> I think the lib could postpone the log replaying until the next publish. Chances are all readers will have the new epoch by than = the writer thread won't have to wait at all.

I think you can't even do a new publish if all readers haven't finished switching to a new epoch after a previous publish. Otherwise, you risk corrupting readers that are still on the original epoch.

bombela2y ago

Interesting, I have always called this double buffering.

cjbgkagh2y ago

I think the difference is that with double buffering the stale data is thrown away but here they maintain a list of operations to repeat on the stale data to bring it into sync.

sa462y ago

Oh neat, that’s what that’s called. I did something similar in ignorance. I needed a histogram metric with multiple lock-free writers and a single reader.

- create two ring buffers (left/right; I called them hot/cold)

- store the hot ring buffer in an atomic pointer

- single reader swaps hot and cold and waits for writers to finish

loeg2y ago

If you're ok with single writer as a constraint, hazard pointers get you similar read availability at the cost of maybe more expensive writes (and similar 2x memory overhead so long as the read sections are relatively short).

colonelxc2y ago

I agree, it is a great pattern if you can spare the memory.

cafxx2y ago· 4 in thread

Why not just allocating the blobs off-heap? (That is something you probably want to do anyway if it's cryptographic material, to avoid being at the mercy of the GC leaving copies around)

ByteBuffer.allocateDirect should do that IIRC. This allows you to use the standard ConcurrentHashMap while being able to get a stable pointer for use by the rust logic.

jerrinot2y ago

I could use DirectByteBuffer instances as CHM values. But Java deallocates the backing memory of DirectByteBuffers during object finalization. If there is no on-heap memory pressure then there is no GC and thus no finalization. So it would leak offheap memory. I could also use Unsafe to hack into DirectByteBuffer and call the Cleaner explicitly. Many libraries do that anyway. But then I would still need some kind of reference counting to make sure I won't deallocate a buffer with active readers.

cafxx2y ago

Or you could simply invoke a GC periodically (or every N times a key is removed from the map, or similar schemes).

Another simple way, if we don't like the idea of triggering GCs manually,is to allocate the same buffer both off-heap and on-heap: use the off-heap one for actual key storage, and the on-heap one just to generate heap memory pressure.

riku_iki2y ago

is there any regret from choosing Java for Db development and need to work around this and likely many other issues?

dee-bee2y ago

Is this the equivalent of directly asking the os for more pages, or does it work via some other heap-like mechanism that simply isn't garbage collected?

jerrinot2y ago· 3 in thread

Hello, the author here. It feels great to see my blog on HN!

It was quite a journey, at first I thought I invented a novel concurrency schema. However, it turns out that it was simply a mix of my ignorance and hubris! :-)

Still, I had a lot of fun while designing this data structure and I believe it made a nice story. Ask me anything!

nextaccountic2y ago

If the keys are strings and you can use Rust, have you considered fst [0]? If the keys have prefixes in common, it is very compact. There's a blog post about it [1]

[0] https://github.com/BurntSushi/fst

[1] https://blog.burntsushi.net/transducers/

The main limitation of it is that it doesn't support removal, but if removals are infrequent you can workaround that with another fst with removed items (and periodically rebuild the whole thing)

porridgeandrice2y ago

Isn't this. this? https://swtch.com/~rsc/regexp/regexp1.html

2 more replies

impish92082y ago

How do you learn stuff like this? Especially the JNI and off-heap memory parts.

audnaun2522y ago· 2 in thread

seems like a lot of effort - could you have moved the map state to rust instead? to invoke the AuthCrypto.verifySignature with just the key?

jerrinot2y ago

That's indeed a very good question! There are 2 sets of reasons:

1. Technical

The contract is given - I'm receiving usernames as CharSequence(String). I could change that, but that would likely require changes in the SQL parser - not simple at all. Alternatively, I could pass the whole CharSequence to Rust - but passing objects over the JNI boundary comes with perf. penalty (when compared to passing primitive) and we avoid that whenever possible. Or I could encode CharSequence content to a temporary offheap buffer and pass just the pointer to Rust. But this brings back some of the questions from the article - like who owns the buffer?

2. Other reasons

I realized this was a possibility only when I was nearly done with the design (this whole endeavor took less than one day) and I felt the urge to finish it. Also: This article wouldn't have been created!

moonchild2y ago

I have the opposite question: why not write everything in java?

motoboi2y ago· 1 in thread

Hi. You missed the opportunity to implement the reader as an AutoCloseable and do the get as a Reader method.

That way, classe users will be warned by the IDE that they should use a try-with-resources when acquiring a Reader.

For the sake of completeness here, Java compiler will warn about it, but that warning is disabled by default.

jerrinot2y ago

That's actually exactly what I did! I removed this part from the article as I felt it was not relevant for the concurrent protocol design and it was adding mental load for readers not that fluent in Java.

nitwit0052y ago· 1 in thread

Did you consider just making an array of HashMap, say 512 of them, and locking only one of them based on your key's hash?

Or, more generally, minimizing the odds of waiting on a lock, without resorting to a complex lock free scheme?

usefulcat2y ago

> Did you consider just making an array of HashMap, say 512 of them, and locking only one of them based on your key's hash?

A disadvantage of that approach is that even without any lock contention you would still have two writes to memory for every lock acquisition.

singron2y ago

You could also check out hazard pointers. This is basically the example used in https://melodiessim.netlify.app/intro-hazard-ptrs/

bionhoward2y ago

cool article, I learned a lot, but I’m scratching my head wondering why they’re settling for Java on this project when Rust is better suited for it, seems like with the lifetimes, need for consistency and concurrency, memory leak issues, and generally the goal for high performance, this would be an awesome project for learning Rust

j / k navigate · click thread line to collapse

29 comments

27 comments · 8 top-level

IndoorPatio2y ago· 8 in thread

https://docs.rs/left-right/latest/left_right/

jerrinot2y ago

I only learned about the left-right schema after finishing my design. I thought I came up with something novel - it turns out it was just my ignorance:)

Still, I think the left-right crate could benefit from one optimization I came up with. This is what the lib says in the docs:

I think the lib could postpone the log replaying until the next publish. Chances are all readers will have the new epoch by than = the writer thread won't have to wait at all.

the_svd_doctor2y ago

>> I thought I came up with something novel - it turns out it was just my ignorance:)

The first 3 years of my PhD summarized in one sentence.

dikei2y ago

> I think the lib could postpone the log replaying until the next publish. Chances are all readers will have the new epoch by than = the writer thread won't have to wait at all.

I think you can't even do a new publish if all readers haven't finished switching to a new epoch after a previous publish. Otherwise, you risk corrupting readers that are still on the original epoch.

bombela2y ago

Interesting, I have always called this double buffering.

cjbgkagh2y ago

I think the difference is that with double buffering the stale data is thrown away but here they maintain a list of operations to repeat on the stale data to bring it into sync.

sa462y ago

Oh neat, that’s what that’s called. I did something similar in ignorance. I needed a histogram metric with multiple lock-free writers and a single reader.

- create two ring buffers (left/right; I called them hot/cold)

- store the hot ring buffer in an atomic pointer

- single reader swaps hot and cold and waits for writers to finish

loeg2y ago

colonelxc2y ago

I agree, it is a great pattern if you can spare the memory.

cafxx2y ago· 4 in thread

Why not just allocating the blobs off-heap? (That is something you probably want to do anyway if it's cryptographic material, to avoid being at the mercy of the GC leaving copies around)

ByteBuffer.allocateDirect should do that IIRC. This allows you to use the standard ConcurrentHashMap while being able to get a stable pointer for use by the rust logic.

jerrinot2y ago

cafxx2y ago

Or you could simply invoke a GC periodically (or every N times a key is removed from the map, or similar schemes).

riku_iki2y ago

is there any regret from choosing Java for Db development and need to work around this and likely many other issues?

dee-bee2y ago

Is this the equivalent of directly asking the os for more pages, or does it work via some other heap-like mechanism that simply isn't garbage collected?

jerrinot2y ago· 3 in thread

Hello, the author here. It feels great to see my blog on HN!

It was quite a journey, at first I thought I invented a novel concurrency schema. However, it turns out that it was simply a mix of my ignorance and hubris! :-)

Still, I had a lot of fun while designing this data structure and I believe it made a nice story. Ask me anything!

nextaccountic2y ago

If the keys are strings and you can use Rust, have you considered fst [0]? If the keys have prefixes in common, it is very compact. There's a blog post about it [1]

[0] https://github.com/BurntSushi/fst

[1] https://blog.burntsushi.net/transducers/

The main limitation of it is that it doesn't support removal, but if removals are infrequent you can workaround that with another fst with removed items (and periodically rebuild the whole thing)

porridgeandrice2y ago

Isn't this. this? https://swtch.com/~rsc/regexp/regexp1.html

2 more replies

impish92082y ago

How do you learn stuff like this? Especially the JNI and off-heap memory parts.

audnaun2522y ago· 2 in thread

seems like a lot of effort - could you have moved the map state to rust instead? to invoke the AuthCrypto.verifySignature with just the key?

jerrinot2y ago

That's indeed a very good question! There are 2 sets of reasons:

1. Technical

2. Other reasons

moonchild2y ago

I have the opposite question: why not write everything in java?

motoboi2y ago· 1 in thread

Hi. You missed the opportunity to implement the reader as an AutoCloseable and do the get as a Reader method.

That way, classe users will be warned by the IDE that they should use a try-with-resources when acquiring a Reader.

For the sake of completeness here, Java compiler will warn about it, but that warning is disabled by default.

jerrinot2y ago

nitwit0052y ago· 1 in thread

Did you consider just making an array of HashMap, say 512 of them, and locking only one of them based on your key's hash?

Or, more generally, minimizing the odds of waiting on a lock, without resorting to a complex lock free scheme?

usefulcat2y ago

> Did you consider just making an array of HashMap, say 512 of them, and locking only one of them based on your key's hash?

A disadvantage of that approach is that even without any lock contention you would still have two writes to memory for every lock acquisition.

singron2y ago

You could also check out hazard pointers. This is basically the example used in https://melodiessim.netlify.app/intro-hazard-ptrs/

bionhoward2y ago

j / k navigate · click thread line to collapse