Collaborative text editing with Eg-Walker: Better, faster, smaller (opens in new tab)

(arxiv.org)

251 pointsczx1113311y ago31 comments

31 comments

18 comments · 8 top-level

matlin1y ago· 5 in thread

Seph (author) also has a reference implementation in Typescript: https://github.com/josephg/eg-walker-reference

I've stated before that I think the main thing holding back collaborative text / sequence CRDTs is integration with a production database.

Eg-walker looks interesting because it might lend itself to be integrated into a database because the operations are immutable and only appended. However, to demonstrate the effectiveness of these algorithms library authors (see Yjs, DiamondTypes, etc) build stand-alone data structures (usually specialized search trees) that most databases already provide.

Personally, I've been trying to adapt a Piece Table[1] to be collaborative and stored in Triplit[2] which runs on both client and server and already implements logical clocks but I might see how well I can adapt this algorithm instead!

1. https://en.wikipedia.org/wiki/Piece_table 2. https://github.com/aspen-cloud/triplit

btown1y ago

This seems to be a holy grail, to be honest! Super-simple database representations with barely any processing required on the "write path," instant startup, minimal memory requirements on both server and client without a need for CRDT data structures to be in memory, none of the O(n^2) complexity of OT. In fact, if I'm interpreting it correctly, it should be straightforward to get this working in a serverless environment without any notion of session fixation, nor active documents needing to be kept in memory.

I can see this completely reshaping the landscape of what's possible with collaborative documents!

josephg1y ago

Author here. Thanks! Yeah this is my hope too.

Egwalker has one other advantage here: the data format will be stable and consistent. With CRDTs, every different crdt algorithm (Yjs, automerge/rga, fugue, etc) actually stores different fields on disk. So if someone figure out a new way to make text editing work better, we need to rip up our file formats and network protocols.

Egwalker just stores the editing events in their original form. (Eg insert “a” at position 100). It uses a crdt implementation in memory to merge concurrent changes (and everyone needs to use the same crdt algorithm for convergence). But the network protocol and file format is stable no matter what algorithm you use.

1 more reply

benjismith1y ago

Awesome, I'm been following Seph's work for many years! Always thoughtful and well-executed. Probably the most prolific and insightful engineer in the "collaborative text editing" universe.

I use ShareDB every day, which originated from Seph's excellent work on OT algorithms. Good stuff!

josephg1y ago

Good to hear it’s still in use! That’s very kind.

riedel1y ago

There was a recent thread about the 2001 post that afaik eventually lead to this paper (diamond types is the rust implementation): https://news.ycombinator.com/item?id=41372833

britannio1y ago· 2 in thread

Joseph explains the algorithm on YouTube too: https://www.youtube.com/watch?v=rjbEG7COj7o

It's great work, combining the best of OT and CRDTs.

auggierose1y ago

I find the formulation in the abstract slightly confusing. As far as I understand EG-Walker is a CRDT, an operation-based one.

josephg1y ago

Author here. It’s kinda both a crdt and an operational transform system.

It’s a crdt in that all peers share & replicate the set of all editing events. (A grow-only set crdt if we’re being precise). Peers can use those editing events to generate the document state at any point in time, merge changes and so on.

But the editing events themselves are stored and expressed in their “original” form (unlike existing CRDTs, which need a prepare function). That means lower memory usage during use.

The replying / merging process itself is kind of a batch operational transform algorithm. It works by building a normal crdt state object in memory in order to transform the events so they can be replayed. In that sense, it’s an OT system. (But one which transforms by using a crdt, like Yjs, internally within each peer).

I don’t know if that clarifies things. Feel free to ask more questions!

2 more replies

abdullahkhalids1y ago· 2 in thread

Do collaborative whiteboard like software use the same algorithms, or are there more suitable algorithms for picture collaborations?

sno61y ago

They usually use a central server and last-writer-wins semantics.

Figma for example https://www.figma.com/blog/how-figmas-multiplayer-technology...

I've seen CF Durable Objects used quite a lot.

There are other emerging patterns too: https://www.instantdb.com/

josephg1y ago

There’s usually more suitable algorithms for picture collaborations.

Text is hard because it’s a list of characters, and when items are inserted and deleted the operations change the index of all subsequent elements.

Usually, editing a digital whiteboard is much simpler.

1attice1y ago· 1 in thread

s/e.g./EG/

deathanatos1y ago

s/e.g./Eg/, which is how the paper stylizes it?

no1youknowz1y ago

I've seen a comment from the YT page:

> While the downside of OT is p2p, the one up side is that you get GIT like history that is super valuable for us especially if we want to build a CDC system.

How trivial would it be, to implement a CDC system from a CRDT. Does anyone know any github repos or any documentation I could refer to? Thanks

Palmik1y ago

Saw the YouTube video when it was first posted, and it could be a great match for a new project I have in mind.

Is there a practical implementation yet that supports not just strings, but also lists and maps?

Would be great to see it integrated into yjs / y-crdt.

eclectic291y ago

If Martin Kleppmann is the author I know this stuff will be worth watching out for.

canadiantim1y ago

Looks like amazing work, congrats!! Excited to see implementations in the wild, definitely would be keen to play around with.

j / k navigate · click thread line to collapse

31 comments

18 comments · 8 top-level

matlin1y ago· 5 in thread

Seph (author) also has a reference implementation in Typescript: https://github.com/josephg/eg-walker-reference

I've stated before that I think the main thing holding back collaborative text / sequence CRDTs is integration with a production database.

1. https://en.wikipedia.org/wiki/Piece_table 2. https://github.com/aspen-cloud/triplit

btown1y ago

I can see this completely reshaping the landscape of what's possible with collaborative documents!

josephg1y ago

Author here. Thanks! Yeah this is my hope too.

1 more reply

benjismith1y ago

Awesome, I'm been following Seph's work for many years! Always thoughtful and well-executed. Probably the most prolific and insightful engineer in the "collaborative text editing" universe.

I use ShareDB every day, which originated from Seph's excellent work on OT algorithms. Good stuff!

josephg1y ago

Good to hear it’s still in use! That’s very kind.

riedel1y ago

There was a recent thread about the 2001 post that afaik eventually lead to this paper (diamond types is the rust implementation): https://news.ycombinator.com/item?id=41372833

britannio1y ago· 2 in thread

Joseph explains the algorithm on YouTube too: https://www.youtube.com/watch?v=rjbEG7COj7o

It's great work, combining the best of OT and CRDTs.

auggierose1y ago

I find the formulation in the abstract slightly confusing. As far as I understand EG-Walker is a CRDT, an operation-based one.

josephg1y ago

Author here. It’s kinda both a crdt and an operational transform system.

But the editing events themselves are stored and expressed in their “original” form (unlike existing CRDTs, which need a prepare function). That means lower memory usage during use.

I don’t know if that clarifies things. Feel free to ask more questions!

2 more replies

abdullahkhalids1y ago· 2 in thread

Do collaborative whiteboard like software use the same algorithms, or are there more suitable algorithms for picture collaborations?

sno61y ago

They usually use a central server and last-writer-wins semantics.

Figma for example https://www.figma.com/blog/how-figmas-multiplayer-technology...

I've seen CF Durable Objects used quite a lot.

There are other emerging patterns too: https://www.instantdb.com/

josephg1y ago

There’s usually more suitable algorithms for picture collaborations.

Text is hard because it’s a list of characters, and when items are inserted and deleted the operations change the index of all subsequent elements.

Usually, editing a digital whiteboard is much simpler.

1attice1y ago· 1 in thread

s/e.g./EG/

deathanatos1y ago

s/e.g./Eg/, which is how the paper stylizes it?

no1youknowz1y ago

I've seen a comment from the YT page:

> While the downside of OT is p2p, the one up side is that you get GIT like history that is super valuable for us especially if we want to build a CDC system.

How trivial would it be, to implement a CDC system from a CRDT. Does anyone know any github repos or any documentation I could refer to? Thanks

Palmik1y ago

Saw the YouTube video when it was first posted, and it could be a great match for a new project I have in mind.

Is there a practical implementation yet that supports not just strings, but also lists and maps?

Would be great to see it integrated into yjs / y-crdt.

eclectic291y ago

If Martin Kleppmann is the author I know this stuff will be worth watching out for.

canadiantim1y ago

Looks like amazing work, congrats!! Excited to see implementations in the wild, definitely would be keen to play around with.

j / k navigate · click thread line to collapse