BUT, since you mention it, I'll say a bit here. It sounds like you have your own experience, and we'd love to hear about that. But OUR experience was: (1) we found (contrary to popular belief) that OT actually does not require a centralized server, (2) we found it to be harder to implement OT exactly right vs CRDTs, and (3) we found many (though not all) of the problems that CRDTs have, are also problems in practice for OT—although in fairness to OT, we think the problems CRDTs have in general are vastly worse to the end-user experience.
If there's interest I'm happy to write a similar article entirely dedicated to OT. But, for (3), as intuition, we found a lot of the problems that both CRDTs and OT have seem to arise from a fundamental impedance mismatch between the in-memory representation of the state of a modern editor, and the representation that is actually synchronized. That is, when you apply an op (CRDT) or a transform (OT), you have to transform the change into a (to use ProseMirror as an example) valid `Transaction` on an `EditorState`. This is not always easy in either case, and to do it right you might have to think very hard about things like "how to preserve position mappings," and other parts of editor state that are crucial to (say) plugins that manage locations of comment marks or presence cursors.
With all of that said, OT is definitely much closer to what modern editors need, in my opinion at least. The less-well-known algorithm we ended up recommending here (which I will call "Marjin Collab", after its author) is essentially a very lightweight OT, without the "transformation" step.
What definitely helped too was having a very specific use case (rich-text editing) which guided many of our decisions. We focused heavily on getting the user experience right for common editing scenarios. And I fully agree that it's not just about conflict resolution, but also things like preserving position mappings. All these mechanisms need to work together for the experience to make sense to the end user.
This is an older piece (from 2018), but we shared more details about our approach here: https://ckeditor.com/blog/lessons-learned-from-creating-a-ri...
One clear issue with OT that we still face today is its complexity. It's nearly 8 years since we launched it, and we still occasionally run into bugs in OT – even though it sits at the very core of our engine. I remember seeing a similar comment from the Google Docs team :D
In theory, yes, but in practice, any OT that operates without a central server (or master peer) essentially ends up being a CRDT. A CRDT is a subset of OT, specifically one that adds the requirement of P2P support.
> (2) we found it to be harder to implement OT exactly right vs CRDTs
I would say that each has its own complexity in different areas. CRDT's complexity lies in its data structure and algorithm, while OT's lies in its sync engine (since it must handle race conditions and guarantee deterministic ordering). In my opinion, OT is simpler overall. Hopefully DocNode and DocSync will make OT even easier.
> (3) we found many (though not all) of the problems that CRDTs have, are also problems in practice for OT
Oh, definitely not! OT has many benefits[1]. I think the misconception stems from the common belief that OT should be positional, rather than id-based. In the first case, operations are transformed on other operations. In the second case, operations can also be transformed on the current document (O(1)), eliminating the problems commonly associated with OT. This is the approach I use in DocNode.
> the problems CRDTs have in general are vastly worse to the end-user experience.
This is 100% correct.
____
https://svn.apache.org/repos/asf/incubator/wave/whitepapers/...
> This is not always easy in either case, and to do it right you might have to think very hard about things like "how to preserve position mappings," and other parts of editor state that are crucial to (say) plugins that manage locations of comment marks or presence cursors.
Maintaining text editor state is normal. Yes you do need to convert the OT messages into whatever diff format your editor requires (and back), but that's the standard glue code.
The nice thing about OT is that you can just feed the positions of marks into the OT algorithm to get the new positional value. Worst case, you just have the server send the server side position when sending the OT event and the client just displays the server side position.
[1]: https://github.com/gritzko/librdx/tree/master/rdx [2]: https://github.com/gritzko/librdx/tree/master/json
You talk about "rebasing" changes in the article. Does that not imply a "transformation" step?
You don't think Figma is a serious application?
By all means, use OT. I worked on OT software for many years - and my work on OT types, ShareJS and ShareDB is still in production all over the place. But I don't think there's anything you can do with OT that you can't do just as well with CRDTs.
The only real benefit of OT is that its simpler to reason about. Maybe that's enough.
I don't know where this popular belief came from. The Figma blog literally says "Figma isn't using true CRDTs"[1].
> The only real benefit of OT is that its simpler to reason about.
That's incorrect. When you free yourself from the P2P restriction that CRDTs are subject to, there's a huge amount of metadata you can get rid of, just to mention one benefit.
[1] https://www.figma.com/blog/how-figmas-multiplayer-technology...
If you are using a centralized server and ProseMirror, there are several OT and pseudo-OT implementations. Most popularly, there is prosemirror-collab[4], which is basically "OT without the stuff you don't need with an authoritative source for documents." Practically speaking that means "OT without T", but because it does not transform the ops to be order-independent, it has an extra step on conflict where the user has to rebase changes and re-submit. This is can cause minor edit starvation of less-connected clients. prosemirror-collab-commit[5] fixes this by performing the rebasing on the server... so it's still "OT without the T", but also with an authoritative conflict resolution pseudo-T at the end. I personally recommend prosemirror-collab-commit, it's what we use, and it's extremely fast and predictable.
If you just want something pedogocically helpful, the blessed upstream collaborative editing solution for CodeMirror is OT. See author's blog post[1], the @codemirror/collab package[2], and the live demo[3]. In general this implementation is quite good and worth reading if you are interested in this kind of thing. ShareJS and OTTypes are both very readable and very good, although we found them very challenging to adopt in a real-world ProseMirror-based editor.
[1]: https://marijnhaverbeke.nl/blog/collaborative-editing-cm.htm...
[2]: https://codemirror.net/docs/ref/#collab
[3]: https://codemirror.net/examples/collab/
[4]: https://github.com/ProseMirror/prosemirror-collab
[5]: https://github.com/stepwisehq/prosemirror-collab-commit
DocSync, which is the sync engine mainly designed with DocNode in mind, I would say is a bit less mature.
I’d love it if you could take a look and see if there’s anything that doesn’t convince you. I’ll be happy to answer any questions.
That is one hot take!
We're the nerdiest bunch in the world, absolutely willing to learn and adapt the most arcane stuff if it gives us a real of percieved advantage, yet the fact that Google Docs style CRDTs have completely elided the profession speaks volumes about their actual usefulness.
Hmm -- this seems a bit apples and oranges to me: collaborative editing is sync; git branches, PRs, etc. are all async. This is by design! You want someone's eyes on a merge, that's the whole rationale behind PRs. Collab editing tries to make merges invisible.
Totally different use case, no?
Just because programming code isn't a good use case for automated conflict resolution doesn't mean everything else isn't.
Just imagine non-technical people using git to collaborate on a report, essay, or blog post. It's never going to happen.
But git branches are collaborative editing! They're just asynchronous collaborative editing.
It would be possible to build a git clone on top of CRDTs, which had the same merge conflict behaviour. The advantage of a system like that would be that you could use the same system for both kinds of collab. editing - realtime collab editing and offline / async collab editing. Its just, nobody has built that yet.
> the fact that Google Docs style CRDTs have completely elided the profession speaks volumes about their actual usefulness.
Software engineers still rely on POSIX files for local cross-app interoperability. Eg, I save a file in my text editor, then my compiler reads it back and emits another file, and I run that. IMO the real problem is that this form of IPC is kind of crappy. There's no good way to get a high fidelity change feed from a POSIX file. To really use CRDTs we'd need a different filesystem API. And we'd need to rewrite all our software to use it.
That isn't happening. So we're stuck with hacks like git, which have to detect and reconstruct all your editing changes using diffs every time you run it. This is why we don't have nice things.
Personally, my main point of frustration with git is the lack of a well-supported ast-based diff. Most text in programming are actually just representations of graphs. I’m sure there is a good reason why it hasn’t caught on, but I find that line-based diffs diverging from what could be a useful semantic diff is the main reason merge conflicts happen and the main reason why I need to stare hard at a pull request to figured out what actually changed.
Google Docs is OT though.
I feel like this post is overly hyperbolic about the choices that an Open Source maintainer made years ago, and no one seemed to care enough to pay him to rewrite it.
We will have better support for ProseMirror's position mapping, but I'm not confident that we will make any progress on this specific issue, since it requires a much larger refactor than what we are contracted for (suggestions & versioning). For selection mapping specifically, we can of course make an effort here to fix it, but the underlying limitations of the split node operations would still be there.
But I hope that you also can appreciate that mapping prosemirror to a CRDT structure is a very complex thing to do. Your schema implementation and node-splitting behaviors are extremely hard to map to existing conflict resolution libraries.
Other editors, like Quill, map better to CRDT structures.
This is, of course, not your problem. But if you - or any editor author for that matter - ever end up creating another editor library, I'd love to work with you on adaptability to existing conflict resolution libraries. There is a lot to gain from being compatible to the wider ecosystem.
After all, we shouldn't expect our users to set up a tailor made backend for each single collaborative component.
Encrypted editing apps like Proton Mail Docs wouldn't be possible with your solution.
To speak on yjs: We use yjs over at LegendKeeper. We're not a huge app, but our users do worldbuilding for D&D, and have amassed over 30+ million collaborative documents, ranging from rich text to fantasy maps to fantasy timelines. Is yjs technically overkill when you have a central tie-breaker? Sure, but the DX is fantastic, and personally I love the idea of my application being truly local-first, even if our core value prop is not necessarily tied to being offline. It also gives me a legacy support plan for our users in case I ever get hit by a bus. :)
On the tech side, you save a lot of cognitive overhead when you can just do:
applyUpdate(docA, update1) applyUpdate(docB, update1)
and now docA and docB are in the same state, no matter what the context. For centralization, adding in a "well, they'll converge once we add a third party" absolutely increases the cognitive complexity of reasoning about your code, and limits your ability to write clean tests. Centralization buys you a lot, too. I don't think one is correct over the other.
There are tradeoffs. There's a memory and CPU cost, and yes, sometimes the "Technically merged state" of a yjs-prosemmirror document is not what's expected. Over seven years and 150,000+ users, we've never had a single person complain about it.
As this article is blowing up now, I want to address a few points.
I, too, feel the need for simplicity over overly complex solutions - and I found it in CRDTs. They beautifully allow me to reason about conflicts - so that my users don't have to. Very few people can design a custom conflict resolution algorithm for an application. Yjs is a general-purpose framework that enables you to make EVERYTHING collaborative. That's the goal.
It's fine if you want to explore different solutions. I don't understand the need to put down one framework in favor of another. It doesn't have to be "OT vs CRDT". Hey, if you found something that works for you - great! But let me tell you that neither solution magically makes everything simpler. There is still a lot to learn.
Different solutions to conflict resolution have different tradeoffs. It's unfortunate that the author of the article attributes all complexity to Yjs. It's just that collaborative editing is a very complex problem and requires a lot of attention to detail. In many regards, Yjs has done very well for the larger ecosystem. In other regards there is room for improvement.
The only thing I acknowledge from the article is the criticism about y-prosemirror "replacing the whole document". Unfortunately, the author extrapolated some false assumptions. This is not a performance issue. y-prosemirror runs at 60 fps even on large documents. It's like arguing React is slow because it replaces the whole document with every edit. We leverage ProseMirror's behavior to do identity checks on the nodes before updating the DOM. However, it's true that this breaks positions for some plugins (e.g. a comment plugin).
Instead of Prosemirror positions, we encourage plugins to use Yjs-based positions, which are more accurate in case of conflicts. Marijn talks about this as well [1]. The collab implementation in Prosemirror does not guarantee that positions always converge. That means, comments could end up in different places for different collaborators. This works in most cases, but in some it doesn't - which is one of the reasons why I prefer CRDTs as a framework to think about conflicts.
But as Nick said, we are currently working on a new y-prosemirror binding that works better with existing plugins.
I'm curious about the section "CRDTs are much, much harder to debug" which ironically talks about how hard prosemirror-collab is to debug. You won't find any such bugs in Yjs. The conflict-resolution algorithm is quite simple and has been battle tested. Before every release, Yjs undergoes extensive fuzz testing for hours in simulated scenarios. I'm very happy to show anyone how to debug a CRDT. It requires some background information, but it ultimately is easier.
To address another unfounded claim by the author: I bet OP $1000 that the GC algorithm in Yjs is correct even in offline-editing scenarios. He won't be able to reproduce the issues he is talking about.
[1]: https://marijnhaverbeke.nl/blog/collaborative-editing-cm.htm...
I didn't take the article as "putting down Yjs", just suggesting it's not the best solution for ProseMirror-backed product use cases patterned after their own.
> y-prosemirror runs at 60 fps even on large documents
Did the OP claim Yjs was slow? Have you created a ProseMirror-backed product of the complexity of Confluence's editor with 16ms frame time targets? The challenge isn't the collab algorithm as much as the CPU time of the plugins, "smart" nodes, and other downstream work triggered by updates. It's incredibly useful to have control over the granularity of updates and that is IMHO easier when dealing more closely with ProseMirror steps and transactions .
> I bet OP $1000 that the GC algorithm in Yjs is correct even in offline-editing scenarios.
Did the OP claim the Yjs algorithm is incorrect?
> You won't find any such bugs in Yjs. The conflict-resolution algorithm..
I didn't take the debugging section to indicate an issue with Yjs convergence.. Nor do most people encounter "bugs" with prose-mirror collab; last update 3 years ago? The debugging challenge is typically around "how did we arrive at this document state"(what steps got us here, where did they come from) and how steps/updates interact with plugins. IMHO discarding the original steps and dealing with a different unit of change complicates that greatly. Especially when dealing with a product in production and a customers broken document that needs to be fixed and root caused..
For example, the debugging section is NOT about debugging problems inside Yjs, it's about debugging one's own bugs while one is using something like Yjs or prosemirror-collab. I'm normally a happy gambler but I actually think the Yjs GC algorithm probably does do what it says... indeed, my complaint is that I specifically do not want that behavior. :)
Likewise, the point I'm making about performance is NOT that Yjs is "slow". I'm saying that it's empirically very challenging to meet the 16ms perf budget even in the simplest possible realistic collab scenario... and that because of this, it is (1) very unappealing and (2) in our experience, challenging to attempt to do this when you also have to do a pile of extra things that are unrelated to the task at hand, like translate `Transaction` to and from operations on an XML doc, and deal with all the consequences of messed up positions passed to plugins. I do understand you have your own benchmarks that give you the confidence to (without qualifications) claim that y-prosemirror "runs at 60fps". You really are not curious about why we think that's not the case?
If we can't get to a shared understand what is being said here, it's going to be very hard to talk about it at all. And on the two material points at stack in your response, I believe you have the precise opposite understanding of what was written. I'm happy to keep discussing but it feels like we're starting from scratch after this response, again.
I’ve spent 3+ years fighting the same problems while building DocNode and DocSync, two libraries that do exactly what you describe.
DocSync is a client-server library that synchronizes documents of any type (Yjs, Loro, Automerge, DocNode) while guaranteeing that all clients apply operations in the same order. It’s a lot more than 40 lines because it handles many things beyond what’s described here. For example:
It’s local-first, which means you have to handle race conditions.
Multi-tab synchronization works via BroadcastChannel even offline, which is another source of race conditions that needs to be controlled.
DocNode is an alternative to Yjs, but with all the simplicity that comes from assuming a central server. No tombstones, no metadata, no vector clock diffing, supports move operations, etc.
I think you might find them interesting. Take a look at https://docukit.dev and let me know what you think.
And for real "action" there should be a delay/pause button to simulate conflicts like the ones described in the blog
The feedback about the delay/pause button is also good, thanks!
I tried to understand what was wrong in Yjs, as I'm using it myself, but your point is not really with Yjs it seems but on how the interaction is with Prosemirror in your use case. I can see why you're bringing up your points against Yjs and I'm having a hard time understanding why you don't consider alternatives to Prosemirror directly. Put another way, "because this integration was bad the source system must also be bad". I do not condone this part of your article. Seems like a sunken cost fallacy to me and reasoning about it at anothers expense, but perhaps not. Hoping to hear back from you.
First, the fact that Yjs bindings, by design (for ~6 years), replace the entire document, does in my opinion, indicate a fundamental misunderstand what rich text editors need to perform well in any circumstance, not just collaborative ones. As I say in the article... I hope to be able to write another article that this has changed and they now "get" it, but for now I do not think it is appropriate to trust Yjs with this task for production-grade editors. I'm sorry to write this, but I do think it's true! I'm not trying to bag on anyone!
Second, and more material: to deploy Yjs to production in the centralized case, I think you are very much swimming against the current of its architecture. Just one example is permissions. There is no established way to determine which peers in a truly-p2p architecture have permissions to add comments vs edit, so you will end up using a centralized server for that. But that's not free, CRDTs are mechanically much more complicated! For example, you have to figure out how to disallow a user to make mark-only edits if they have "commenter" access, but allow editing the whole doc for "editor" access. This is trivial in `prosemirror-collab` (say) but it's very hard in Yjs because you have to map it "through" their XML transformations model.
I'm happy to talk more about this if it's helpful. But yes, we are trying to say some stuff about Yjs specifically, and some stuff about CRDTs generally.
That said, it's not without problems - I acknowledge that. But it's not as bad as you make it sound. You didn't list one concrete case when you had issues with it.
I'm very happy that Nick and I finally found funding to make a rewrite happen. It really did take 6 years to make this happen, because it's hard to find funding for open source projects.
Not sure what the error conditions loop like, but you could probably bootstrap message hashes in a metadata array in the object, along with encryption signatures to prevent unwanted updates to objects.
I made a tiny self contained implementation of this algorithm here if anyone is curious:
https://github.com/josephg/crdt-from-scratch/blob/master/crd...
FugueMax (or Yjs) fit in a few hundred lines of code. This approach also performs well (better than a tree based structure). And there's a laundry list of ways this code can be optimised if you want better performance.
If anyone is interested in how this code works, I programmed it live on camera in a couple hours:
https://www.youtube.com/watch?v=_lQ2Q4Kzi1I
This implementation approach comes from Yjs. The YATA (yjs) academic paper has several problems. But Yjs's actual implementation is very clever and I'm quite confident its correct.
Try to understand 3.1-3.4 in this paper, and you'll find that the correctness proof doesn't prove anything.
In particular, when they define <_c, they do this in terms of rule1, rule2, and rule3, but these are defined in terms of <_c, so this is just a circular definition, and therefore actually not a definition at all, but just wishful thinking. They then prove that <_c is a total order, but that proof doesn't matter, because <_c does not exist with the given properties in the first place.
It is a collaborative markdown file that also renders very fast. So far so good.
And then... it somehow adds Javascript? And React? And somehow AI is involved? I truly don't understand what it is, and I am (I think) the end customer...
edit: I tried it and I just get "Loading..." forever. So, anyway, next time.
EDIT: I will say I'm not against AI writing tools or anything like that. But, for better or worse, that's just not what happened here.
It is very true that there are nuances you have to deal with when using CRDT toolkits like Yjs and Automerge - the merged state is "correct" as a structure, but may not match your scheme. You have to deal with that into your application (Prosemirror does this for you, if you want it, and can live with the invalid nodes being removed)
You can't have your cake and eat it with CRDTs, just as you can't with OT. Both come with compromises and complexities. Your job as a developer is to weigh them for the use case you are designing for.
One area in particular that I feel CRDTs may really shine is in agentic systems. The ability to fork+merge at will is incredibly important for async long running tasks. You can validate the state after an agent has worked, and then decide to merge to main or not. Long running forks are more complex to achieve with OT.
There is some good content in this post, but it's leaning a little too far towards drama creation for my tast.
I think it's defensible to say that this point in particular is not indicting CRDTs in general because I do say the authors are trying to fix it, and then I link to the (unpublicized) first PR in that chain of work (which very few people know about!), and I specifically spend a whole paragraph saying I hope that I a forced to write an article in a year about how they figured it all out! If I was trying to be disingenuous, why do any of that?
It's easy to make that mistake reading your post because of sentences like
> I want to convince you that all of these things (except true master-less p2p architecture) are easily doable without CRDTs
> But what if you’re using CRDTs? Well, all these problems are 100x harder, and none of these mitigations are available to you.
It sure sounds a lot like you're calling CRDTs in general needlessly complex, not just the yjs-prosemirror integration.
This is all opensource software, provided for free by volunteers. If you want better bindings, go write them. Or pay someone else to do so.
EDIT: I live in Seattle and it is 12:34, so I must go to bed soon. But I will wake up and respond to comments first thing in the morning!
But, seeing how I've had several people who read your article write me asking about this miraculous collab implementation, I want to push back on the framing that ProseMirror's algorithm is 'simple' or '40 lines of code'. The whole document and change model in ProseMirror was designed to make something like prosemirror-collab possible, so there is a lot of complexity there. And it has a bunch of caveats itself—the server needs to hold on to changes as long as there may be clients to need to catch up, and if you have a huge amount of concurrent edits, the quadratic complexity of the step mapping can become costly, for example. It was designed to support rich text, document schemas, and at least a little bit of keeping intentions in mind when merging (it handles the example in the first post of your series better, for example), but it's not a silver bullet, and I'd hate for people to read this and go from thinking 'CRDT will solve my problems' to 'oh I need to switch to ProseMirror to solve my problems'.
With that said... I do not agree it is not "simple." Or at least, I think it is about as simple as it can possibly be.
There seems to be a conflict of interest with describing Yjs's performance, which basically does the same thing along with Automerge.
But the product seems much more narrow than an actual tool run the whole business in markdown. I was hoping to see Logseq on steroids, and it feels like a tool builder primarily. I love the tool building aspect, but the fundamentals of simply organizing docs (docs, presentations, assets etc, the basics of a business) are either not part of the core offering or not presented well at all.
I love the idea of building custom tools on top of MD and it's part of my wishlist, but I feel little deceived by your tagline so I wanted to share that :)
The logic that makes sense is you are using your own framing (Moment.dev will later be paid and people will be customers) to interpret Yjs.
Moreover, the 'social proof' posted by the following later on by 'auggierose' and 'skeptrune': - https://news.ycombinator.com/item?id=47396154 - https://news.ycombinator.com/item?id=47396139
Appears, to me, to be manufactured. The degree of consolidation in this 'SF/Bay Area tech cult' which I've noticed, although I am unsure if others are aware, that tries to help other members at the expense of quality, growing network wealth through favoritism rather than adherence to quality, is counterpoint to users whose interest is high quality software without capture.
While you may not like me describing this, it is not in your own interest to do this because it catabolizes the base layer that would sustain you. Social media catabolizes actual social networks, as AI catabolizes those who write information online. Behavior like this ruins the public commons over time.
It would have been nice if the article compared yjs with automerge and others. Jsonjoy, in particular, appears very impressive. https://jsonjoy.com/
They use Yjs: https://make.wordpress.org/core/2026/03/10/real-time-collabo...
The current implementation does suffer from the same issue noted for the Yjs-ProseMirror binding: collaborative changes cause the entire document to be replaced, which messes with some ProseMirror plugins. Specifically, when the client receives a remote change, it rolls back to the previous server state (without any pending local updates), applies the incoming change, and then re-applies its pending local updates; instead of sending a minimal representation of this overall change to ProseMirror, we merely calculate the final state and replace with that.
This is not an inherent limitation of the collaboration algorithm, just an implementation shortcut (as with the Yjs binding). It could be solved by diffing ProseMirror states to find the minimal representation of the overall change, or perhaps by using ProseMirror's built-in undo/redo features to "map" the remote change through the rollback & re-apply steps.
Specifically, it inspired the question: how can one let programmers customize the way edits are processed, to avoid e.g. the "colour" -> "u" anomaly*, without violating CRDT/OTs' strict algebraic requirements? To which the answer is: find a way to get rid of those requirements.
*This is not just common behavior, but also features in a formal specification [1] of how collaborative text-editing algorithms should behave! "[The current text] contains exactly the [characters] that have been inserted, but not deleted."
[1] http://www.cs.ox.ac.uk/people/hongseok.yang/paper/podc16-ful...
const result = step.apply(this.doc);
if (result.failed) return false;
I suspect this doesn't work.In general, the client implementation of collab is pretty simple. Nearly all of the subtlety lies in the server. But it, too, is generally not a lot of code, see for example the author's implementation: https://github.com/ProseMirror/website/tree/master/src/colla...
But there's another issue that the author hasn't even considered, and possibly it's the root cause why the prosemirrror (which I'd never heard of before btw) does the thing the author thinks is broken... Say you have a document like "请来 means 'please go'" and independently both the Chinese and English collaborators look at that and realise it's wrong. One changes it to "请走 means 'please go'" and the other changes it to "请来 means 'please come'". Those changes are in different spans, and so a merge would blindly accept both resulting in "请走 means 'please come'" which is entirely different from the original, but just as incorrect. Depending on how much other interaction the authors have, this could end up in a back and forth of both repeatedly changing it so the merged document always ended up incorrect, even though individually both authors had made valid corrections.
That example seems a bit hypothetical, but I've experienced the same thing in software development where two BAs had created slightly incompatible documents stating how some functionality should work. One QA guy kept raising bugs saying "the spec says it should do X", the dev would check the cited spec and change the code to match the spec. Weeks later, a different QA guy with a different spec would raise a bug saying "why is this doing X? The spec says it should do Y", a different dev read the cited spec, and changed the code. In this case, the functionality flip-flopped about 10 times over the course of a year and it was only a random conversation one day where one of them complained about a bug they'd fixed many times and the other guy said "hey, that bug sounds familiar" and they realised they were the two who'd been changing the code back and forth.
This whole topic is interesting to me, because I'm essentially solving the same problem in a different context. I've used CRDT so far, but only for somewhat limited state where conflicts can be resolved. I'm now moving to a note-editing section of the app, and while there is only one primary author, their state might be on multiple devices and because offline is important to me, they might not always be in sync. I think I'm probably going to end up highlighting conflicts, I'm not sure. I might end up just re-implementing something akin to Quill's system of inserts / deletes.
I also tried out the behaviour of their example. Slowing the sync time down to 3 seconds, and then typing "Why not" and then waiting for it to sync before adding " do this?" on client A and " joke?" on client B. The result was "Why not do this? joke?" when I'd have hoped that this would have been flagged as a conflict. Similarly, starting with "Why not?" and adding both " do this" and " joke" in the different clients produced "Why not do this joke?" even though to me, that should have been a conflict - both were inserting different content between "t" and "?".
Finally, changing "do" to "say" in client A and THEN changing "do" to "read" in client B before it updated, actually resulted in a conflict in the log window and the resultant merge was "Why not rayead this joke?" Clearly this merge strategy isn't that great here, as it doesn't seem to be renumbering the version numbers based on the losing side (or I've misunderstood what they're actually doing).
I am long-time CKEditor dev, I was responsible for implementing real-time collaboration in the editor and the OT implementation.
Regarding the first part of your article. Guess what - CKEditor would output "" :). And even better, if the user who deleted all does undo, you'd get "u" where it was typed originally.
However, I fully agree, that for every algorithm, you will be able to find a scenario where it fails to resolve conflict in a way expected by the user. But we cannot ask user to resolve a conflict manually every time it happens.
Offline editing, as you correctly observed, is more difficult, because the conflicts pile up, and multiple wrong decisions can result in a horrifying final result. I fully agree, that this is not only an algorithmic problem but also a UX problem. Add to this, that in many apps, you will also have other (meta)data that has to be synced too (besides document data).
CKEditor is, in theory, ready for offline editing. From algorithm POV, offline is no different than very very very slow connection (*). In the end, you receive a set of operations to transform against other set of operations. However, currently we put the editor in read-only state when the connection breaks. We are aware, that even if all transformations resolve as expected, then the end result may still be "weird". And even if the end result is actually as expected, the amount of changes may be overwhelming to a person who just got the connection back, so it still may be good to provide some UI/UX to help them understand what happened.
(*) - that is, unless the editing session on the server ended already, and, simply saying, you don't have anything to connect to (to pull operations from).
Regarding OT. I have a feeling that one mistake most people make, is that they take OT as it is described in some papers or article, and don't want to iterate over this idea. To me, this is not just one algorithm, rather an idea of how to think about and mange changes happening to the data.
For CKEditor, from the very beginning, we were forced to innovate over typical OT implementations. First of all we focused on users intentions. Second of all, we needed to adapt it to tree data structure. These challenges shaped my way of thinking - OT is "an idea", you need to adapt it to your project. Someone here asked if there's library for OT, because they want to use it for spreadsheets. I'll say -- write it on your own and adapt it to spreadsheets. You'll discover that maybe you don't need some operations, or maybe you need new operations dedicated for spreadsheets. This is what we ended up doing. @Reinmar already posted this link here, but we describe our approach here: https://ckeditor.com/blog/lessons-learned-from-creating-a-ri....
Circling back to your example with typing and removing whole sentence. This is how you innovate over OT. To us, such deletion is not deleting N singular characters starting from position P. The intention is to remove some continuous range of text. If someone writes inside the range, it just changes the boundary of stuff to remove, but surely we don't want to show some random letters after the deletion happens. We account for that and make modifications in our OT implementation.
Similarly with positions in document. In CKEditor, you can use LivePositions and LiveRanges, which are basically paths in tree data structure. Every position is transformed by operation too. Many features we have base on that.
So, my take here is -- don't bash OT because you based your experience on some simple implementations. Possibly the same is with Yjs. Don't bash CRDTs because Yjs is doing something badly?
And some final words regarding the second part.
We also follow the same pattern as your diagram shows in "How the simple thing works" section. As I was reading through the article, and looking at provided examples, it's hard for me not to think, that what's happening is some kind of an OT-variant, maybe simplified, or maybe adapted to some specific cases. But there are strong similarities between what you described and CKEditor 5, and we use OT. Like, looking at this from top-level view, I could say, "well, we do the same". We have the same loop with conflict resolution, we just call "rebase" a "transformation", and instead "steps" we have "operations".
Also, you say it is 40LOCs, but how much magic happens in `step.apply()`? How much the architecture was made to make it possible? Even Marijn makes this comment here: https://news.ycombinator.com/item?id=47409647.
For comparison, this is CKEditor's file that includes the OT functions to transform operations: https://github.com/ckeditor/ckeditor5/blob/master/packages/c.... It's 2600LOCs (!), but at least most of it are comments :). Again, the basic idea for OT is very simple (and this implementation could be simpler, we also learned a lot in the process). It's up to you how much you want to delve into solving "user intention" issues.
Right, but if you are already using ProseMirror that infrastructure is in place if you are taking advantage of it directly or bolting Yjs on top..
I really disagree with this article - despite protestation, I feel like their issue is with Yjs, not CRDTs in general.
Namely, their proposed solution:
1. For each document, there is a single authority that holds the source of truth: the document, applied steps, and the current version.
2. A client submits some transactional steps and the lastSeenVersion.
3. If the lastSeenVersion does not match the server’s version, the client must fetch recent changes(lastSeenVersion), rebase its own changes on top, and re-submit.
(3a) If the extra round-trip for rebasing changes is not good enough for you, prosemirror-collab-commit does pretty much the same thing, but it rebases the changes on the authority itself.
This is 80% to a CRDT all by itself! Step 3 there, "rebase its own changes on top" is doing a lot of work and is essentially the core merge function of a CRDT. Also, the steps needed to get the rest of the way to a full CRDT is the solution to their logging woes: tracking every change and its causal history, which is exactly what is needed to exactly re-run any failing trace and debug it.Here's a modified version of the steps of their proposed solution:
1. For each document, every participating member holds the document, applied steps, and the current version.
2. A client submits (to the "server" or p2p) some transactional steps and the lastSeenVersion.
3. If the lastSeenVersion does not match the "server"/peer’s version, the client must fetch recent changes(lastSeenVersion). The server still accepts the changes. Both the client and the "server" rebase the changes of one on top of the other. Which one gets rebased on top of the other can be determined by change depth, author id, real-world timestamp, "server" timestamp, whatever. If it's by server timestamp, you get the exact behavior from the article's solution.
If you store the casual history of each change, you can also replay the history of the document and how every client sees the document change, exactly as it happened. This is the perfect debugging tool!CRDTs can store this casual history very efficiently using run-length encoding: diamond-types has done really good work here, with an explanation of their internals here: https://github.com/josephg/diamond-types/blob/master/INTERNA...
In conclusion, the article seems to be really down on CRDTs in general, whereas I would argue that they're really down on Yjs and have written 80+% of a CRDT without meaning to, and would be happier if they finished to 100%. You can still have the exact behavior they have now by using server timestamps when available and falling back to local timestamps that always sort after server timestamps when offline. A 100% casual-history CRDT would also give them much better debugging, since they could replay whatever view of history they want over and over. The only downside is extra storage, which I think diamond-types has shown can be very reasonable.
Another important thing is that CRDTs by themselves cannot give you a causal ordering (by which I mean this[2]), because definitionally causal ordering requires a central authority. Funnily enough, the `prosemirror-collab` and `prosemirror-collab-commit` do give you this, because they depend on an authority with a monotonically increasing clock. They also also are MUCH better at representing the user intent, because they express the entirety of the rich text transaction model. This is very emphatically NOT the case with CRDTs, which have to pipe your transaction model through something vastly weaker and less expressive (XML transforms), and force you to completely reconstruct the `Transaction` from scratch.
Lastly for the algorithm you propose... that is, sort of what `prosemirror-collab-commit` is doing.
[1]: https://www.inkandswitch.com/peritext/
[2]: https://www.scattered-thoughts.net/writing/causal-ordering/
Having watched that space now for the last nearly 25 years... of all the projects I've abandoned over the years, that is the one that I am most grateful I gave up on. The gulf between "hey what if we could collaboratively edit live" and what it takes to actually implement it is one of the largest mismatches between intuition and reality I know of. I had no idea.
That's why I created prosemirror-collab-commit.
Or internally: [ [HLC, “id:2”, ‘parent_id’, “id:1”], [HLC, “id:2”, ‘type’, “text”], ... ]
Merging is easy, and allows for atomic modifications without rebuilding the entire tree, as well as easy conflict resolution. We add the HLC (clock, peer id). If the time difference between the two clocks is significant, we create a new field [HLC, id, “conflict:” + key, old_value]
Yes, here I agree: yjs core is well written, while plugins are “nice to have”.