I haven't used any of these yet, I've been waiting for one to obviously "win " and gain some support, but have definitely felt the pain.
Use case: Sending a Kafka message to add or remove a value from an array field without having to know the rest of the array
Add in defending against pathologically malformed patches - several vulnerabilities were discovered in the golang patch libraries Kube used - and one starts to develop a twitch whenever it is mentioned…
You potentially need much more for some scenarios where you could be applying a patch to a previously modified document. It has some limited support for some common operations for these, but quite far from everything that could be useful. But it is also true that they needed to put the limit somewhere, and where they stopped is reasonable for a fair few scenarios, but certainly not every possible one.
oh no.
> What will the distributed data environment in Web3 look like?
As a web developer, I expected an interesting JSON-related post, so I was kind of disappointed by the intro (and it's not just the first sentence, the entire introductory paragraph talks about “web3”)…
> This conflict can be surfaced to Alice, and Bob can be allowed to go about his business. Could this particular problem be resolved in a purely automatic way with a CRDT? Definitely, but it probably will not result in what you want. Last first will work of course, but then which is more right might need human review, and even worse it might result in both results being interleaved (a likely outcome!).
The article goes on to suggest that with a system of sharing patches, we can synchronize our distributed data stores with more precise semantics, even if we do need human intervention on conflicts sometimes. Part of this is agreeing on patch order:
> We can stack either patch in any order without difficulty. Perhaps we ask Bob and Alice to agree on the application order (using pull / push as is done with git). But maybe we just allow them to apply when they arrive. The answer depends on the workflow.
Do you know what a system that works like pull->rebase->push sounds like to me? If you squint a little? This sounds like operational transforms [1]. Especially if you are considering multiple different patching semantics -
> Which of these you want, however, requires semantic direction of the diff algorithm. While lots of structured diff problems will be solved by the simplest algorithm, ultimately we need to have a schema that helps to direct the meaning of our diffs. String fields might be best line-based, word-based, or perhaps they must always be atomic (as with identifiers).
Each patching semantic is a different type of Operation. Rebasing your local changes before sending your pending patches is the Transform. The main advantage of OT systems over CRDTs is that OT also allows for conflicts & human in the loop conflict resolution. @josephg built a JSON Operational Transform library [2] that has interesting operations like Move (something diff/patch really struggles with) as well as conflict resolution.
The thing I like about the OT model is that it’s pretty easy to nest other approaches inside OT. Want to express 5 different patch semantics? Make an operation type for each. Want to support CRDT as well? Sure, make an operation type called CRDTUpdate that contains whatever delta data the underlying CRDT system would send.
No matter what strategy you pick, remember to fuzz test your distributed sync system for convergence.
[1]: https://en.m.wikipedia.org/wiki/Operational_transformation
I write a little bit more about the relationship here: https://github.com/terminusdb/technical-blogs/blob/main/blog...
https://git-scm.com/book/en/v2/Getting-Started-What-is-Git%3...
It makes git significantly faster (because it doesn't apply sequence of patches when checking out) and much more robust than a patch and diff system.
I think it's an important point here because it does play a bit of a counterpoint to the OP's argument. In fact, the git storage system is precisely very simple because it's content addressed only and has no notion of history or difference,
https://git-scm.com/book/en/v2/Git-Internals-Git-Objects
In the article, the author mentions something about Alice and Bob sharing only differences in JSON has a way to simplify changes. If anything the lesson from git's success should be that the aurhor's approach (the diff/patch way of Subversion etc.) is exactly the wrong one. Communicating only with full, valid JSONs and showing diffs only as a convenience to the user would be the actual "git way".
> If anything the lesson from git's success should be that the aurhor's approach (the diff/patch way of Subversion etc.) is exactly the wrong one.
The problem with subversion was not the mechanism of storage of objects. And the strength of git doesn't lie here either. It's rather the very flexible nature of its commit metadata and it's design which builds on the idea of multiple masters. This was what was stifling in subversion.
The method of storage whether of the deltas, or the individual states is trivially equivalent. You can inter-convert them if you'd like, so clearly this can't be a game changer.
But even if it were and git's approach was "The Right Thing", this would not be a lesson to learn from git when dealing with data storage because it scales poorly.
Every state transition would have to re-represent the entire state of the database. This will work fine for code, but for databases with tons of objects changing and being extended all the time it would be crazy. It makes a lot more sense to store the differences.