What's the Difference: JSON diff and patch (opens in new tab)

(github.com)

39 pointsggleason4y ago27 comments

27 comments

18 comments · 5 top-level

We already have json patch(http://jsonpatch.com/) and we can sent diffs as json patch document, so I don't understand what is the purpose of this proposal.

skrtskrt4y ago

We also have JSON Merge Patch (https://www.rfc-editor.org/rfc/rfc7396), and both have very little adoption (judging by the available libraries & integrations)

I haven't used any of these yet, I've been waiting for one to obviously "win " and gain some support, but have definitely felt the pain.

Use case: Sending a Kafka message to add or remove a value from an array field without having to know the rest of the array

smarterclayton4y ago

Kubernetes supports both JSON patch and merge patch, and it has been a continuous source of amazement to me (in the “that’s a bloody giant hole in the ground” sense) of how hard actually calculating, representing, and developing durable intuition about finding differences between and merging JSON-like trees (empty lists, null vs empty, representing insertion vs deletion, lists with primary keyed members, list uniqueness, field ordering, duplicate fields, number equivalence, etc).

Add in defending against pathologically malformed patches - several vulnerabilities were discovered in the golang patch libraries Kube used - and one starts to develop a twitch whenever it is mentioned…

ninja_daro_yco4y ago

Actually we are using json patch/json pointer extensively to limit data sent through network or update live data on a client(its embedded world). If user is changing single parameter on a client we sent just json patch as defined in RFC(including arrays modifications). Libraries are mature and simple to use(on client, on server we are using C++ so I suppose its a in house solution), thats why it's hard for me to understand why we need another solution for the same problem. Having said that I haven't used them in Kafka world.

Bjartr4y ago

For a(n anec)datapoint, I'm using JSON Patch in a production application.

1 more reply

LukeEF4y ago

TerminusDB person here: we were using the API internally for our immutable collaboration database service [1] so we thought we'd make it available for others to use. We think it is better than the one derived from the RFC and built specifically for collaboration, but you can be the judge of that! In any case - choice is good, right?

[1] https://terminusdb.com/

ggleasonOP4y ago

We use JSON-patch/diff internally for our rebase operation. We looked at implementing it on rfc6902 but it turned out to be awkward because of the path descriptions, and insufficiently rich for the kinds of patches that we needed. We could have extended rfc6902 but then we'd be in the embrace / extend / destroy pattern no?

jsmith454y ago

Yeah RFC6902 works best when you are only trying to create patches that are safe to apply to exactly the expected document.

You potentially need much more for some scenarios where you could be applying a patch to a previously modified document. It has some limited support for some common operations for these, but quite far from everything that could be useful. But it is also true that they needed to put the limit somewhere, and where they stopped is reasonable for a fair few scenarios, but certainly not every possible one.

littlestymaar4y ago· 3 in thread

> web3

oh no.

dang4y ago

"Please don't pick the most provocative thing in an article or post to complain about in the thread. Find something interesting to respond to instead."

https://news.ycombinator.com/newsguidelines.html

littlestymaar4y ago

Honestly, I don't think this applies here, given this is the first sentence:

> What will the distributed data environment in Web3 look like?

As a web developer, I expected an interesting JSON-related post, so I was kind of disappointed by the intro (and it's not just the first sentence, the entire introductory paragraph talks about “web3”)…

2 more replies

xeromal4y ago

How are you everwhere, dang?

1 more reply

jitl4y ago· 2 in thread

The article discusses the shortcoming of CRDTs - sometimes you do want a conflict that a human can resolve, instead of an algorithms best guess:

> This conflict can be surfaced to Alice, and Bob can be allowed to go about his business. Could this particular problem be resolved in a purely automatic way with a CRDT? Definitely, but it probably will not result in what you want. Last first will work of course, but then which is more right might need human review, and even worse it might result in both results being interleaved (a likely outcome!).

The article goes on to suggest that with a system of sharing patches, we can synchronize our distributed data stores with more precise semantics, even if we do need human intervention on conflicts sometimes. Part of this is agreeing on patch order:

> We can stack either patch in any order without difficulty. Perhaps we ask Bob and Alice to agree on the application order (using pull / push as is done with git). But maybe we just allow them to apply when they arrive. The answer depends on the workflow.

Do you know what a system that works like pull->rebase->push sounds like to me? If you squint a little? This sounds like operational transforms [1]. Especially if you are considering multiple different patching semantics -

> Which of these you want, however, requires semantic direction of the diff algorithm. While lots of structured diff problems will be solved by the simplest algorithm, ultimately we need to have a schema that helps to direct the meaning of our diffs. String fields might be best line-based, word-based, or perhaps they must always be atomic (as with identifiers).

Each patching semantic is a different type of Operation. Rebasing your local changes before sending your pending patches is the Transform. The main advantage of OT systems over CRDTs is that OT also allows for conflicts & human in the loop conflict resolution. @josephg built a JSON Operational Transform library [2] that has interesting operations like Move (something diff/patch really struggles with) as well as conflict resolution.

The thing I like about the OT model is that it’s pretty easy to nest other approaches inside OT. Want to express 5 different patch semantics? Make an operation type for each. Want to support CRDT as well? Sure, make an operation type called CRDTUpdate that contains whatever delta data the underlying CRDT system would send.

No matter what strategy you pick, remember to fuzz test your distributed sync system for convergence.

[1]: https://en.m.wikipedia.org/wiki/Operational_transformation

[2]: https://github.com/ottypes/json1

ggleasonOP4y ago

It absolutely does sound like Operational Transforms! There are a lot of similarities, and in many cases patch re-ordering for specific ordered fields will require the sort of index manipulations that are common in OT - a sort of commutator patch.

I write a little bit more about the relationship here: https://github.com/terminusdb/technical-blogs/blob/main/blog...

LukeEF4y ago

We had a discussion around these various approaches recently: https://github.com/terminusdb/technical-blogs/blob/main/blog...

xmcqdpt24y ago· 1 in thread

While I agree that people generally think of version control (and git) in terms of diff and patch, I think it's important to note that git itself doesn't have diff and patches. The fact that git doesn't have diffs is one of its defining characteristics.

https://git-scm.com/book/en/v2/Getting-Started-What-is-Git%3...

It makes git significantly faster (because it doesn't apply sequence of patches when checking out) and much more robust than a patch and diff system.

I think it's an important point here because it does play a bit of a counterpoint to the OP's argument. In fact, the git storage system is precisely very simple because it's content addressed only and has no notion of history or difference,

https://git-scm.com/book/en/v2/Git-Internals-Git-Objects

In the article, the author mentions something about Alice and Bob sharing only differences in JSON has a way to simplify changes. If anything the lesson from git's success should be that the aurhor's approach (the diff/patch way of Subversion etc.) is exactly the wrong one. Communicating only with full, valid JSONs and showing diffs only as a convenience to the user would be the actual "git way".

ggleasonOP4y ago

But actually git does use diff and patch. It just doesn't store objects as diffs and patches. Whenever you do a cherry pick operation or rebase operation it synthesises patches. You can try yourself by cherry-picking a merge commit. What even does this mean? Well, git doesn't know either because it can't figure out how to synthesise the patch unless you tell it.

> If anything the lesson from git's success should be that the aurhor's approach (the diff/patch way of Subversion etc.) is exactly the wrong one.

The problem with subversion was not the mechanism of storage of objects. And the strength of git doesn't lie here either. It's rather the very flexible nature of its commit metadata and it's design which builds on the idea of multiple masters. This was what was stifling in subversion.

The method of storage whether of the deltas, or the individual states is trivially equivalent. You can inter-convert them if you'd like, so clearly this can't be a game changer.

But even if it were and git's approach was "The Right Thing", this would not be a lesson to learn from git when dealing with data storage because it scales poorly.

Every state transition would have to re-represent the entire state of the database. This will work fine for code, but for databases with tons of objects changing and being extended all the time it would be crazy. It makes a lot more sense to store the differences.

bradjonesca4y ago

The ability to accommodate CRDT's in the architecture (future) is a gamechanger

j / k navigate · click thread line to collapse

27 comments

18 comments · 5 top-level

ninja_daro_yco4y ago· 7 in thread

We already have json patch(http://jsonpatch.com/) and we can sent diffs as json patch document, so I don't understand what is the purpose of this proposal.

skrtskrt4y ago

We also have JSON Merge Patch (https://www.rfc-editor.org/rfc/rfc7396), and both have very little adoption (judging by the available libraries & integrations)

I haven't used any of these yet, I've been waiting for one to obviously "win " and gain some support, but have definitely felt the pain.

Use case: Sending a Kafka message to add or remove a value from an array field without having to know the rest of the array

smarterclayton4y ago

ninja_daro_yco4y ago

Bjartr4y ago

For a(n anec)datapoint, I'm using JSON Patch in a production application.

1 more reply

LukeEF4y ago

[1] https://terminusdb.com/

ggleasonOP4y ago

jsmith454y ago

Yeah RFC6902 works best when you are only trying to create patches that are safe to apply to exactly the expected document.

littlestymaar4y ago· 3 in thread

> web3

oh no.

dang4y ago

"Please don't pick the most provocative thing in an article or post to complain about in the thread. Find something interesting to respond to instead."

https://news.ycombinator.com/newsguidelines.html

littlestymaar4y ago

Honestly, I don't think this applies here, given this is the first sentence:

> What will the distributed data environment in Web3 look like?

2 more replies

xeromal4y ago

How are you everwhere, dang?

1 more reply

jitl4y ago· 2 in thread

The article discusses the shortcoming of CRDTs - sometimes you do want a conflict that a human can resolve, instead of an algorithms best guess:

No matter what strategy you pick, remember to fuzz test your distributed sync system for convergence.

[1]: https://en.m.wikipedia.org/wiki/Operational_transformation

[2]: https://github.com/ottypes/json1

ggleasonOP4y ago

I write a little bit more about the relationship here: https://github.com/terminusdb/technical-blogs/blob/main/blog...

LukeEF4y ago

We had a discussion around these various approaches recently: https://github.com/terminusdb/technical-blogs/blob/main/blog...

xmcqdpt24y ago· 1 in thread

https://git-scm.com/book/en/v2/Getting-Started-What-is-Git%3...

It makes git significantly faster (because it doesn't apply sequence of patches when checking out) and much more robust than a patch and diff system.

https://git-scm.com/book/en/v2/Git-Internals-Git-Objects

ggleasonOP4y ago

> If anything the lesson from git's success should be that the aurhor's approach (the diff/patch way of Subversion etc.) is exactly the wrong one.

The method of storage whether of the deltas, or the individual states is trivially equivalent. You can inter-convert them if you'd like, so clearly this can't be a game changer.

But even if it were and git's approach was "The Right Thing", this would not be a lesson to learn from git when dealing with data storage because it scales poorly.

bradjonesca4y ago

The ability to accommodate CRDT's in the architecture (future) is a gamechanger

j / k navigate · click thread line to collapse