Theodore Ts'o on how he uses Git when working on Linux (2017) (opens in new tab)

(lore.kernel.org)

48 pointsnativecoinc3y ago7 comments

7 comments

6 comments · 2 top-level

gumby3y ago· 4 in thread

There are two competing needs to be considered when figuring out what your workflow should be in regard to history.

Both come from the fundamental question: “When (if) we look back in history, what are we looking for?” Keeping everything as it was reduces the risk of deleting something that will later be important; consolidating is supposed to reduce the risk of missing the needle in the haystack or discouraging looking back at all.

Curating the past is 99% wasted effort since looking back is rare. I think the best compromise is to add some automation if you really care, as Ted suggested.

woolion3y ago

>Curating the past is 99% wasted effort since looking back is rare.

This is the worst kind of self-fulfilling prophecy. It is exactly the same as 'tidying your home is not worth it because you will need to search for things anyway'. For it to be useful, you need to have proper atomic commits and useful messages, same as a good organization, otherwise looking back would have little added value. And it's also something with little overhead if the discipline is integrated into your workflow.

The only reason you wouldn't need a good history is if your git repository is 100% bug-free. Then you don't need to understand why or how the bug was introduced, if some weird piece of code is handling a very specific edge-case or was just poorly written. Is the bug generalizable? It's also something you'd probably see fast by knowing if it was introduced in a local commit or a refactoring one.

Code-wise 'history-obliviousness' (or proper Git hygiene) is among the worst banes of programmers, I believe.

fulafel3y ago

I guess "curate" could be interpreted either way. It would seem possible you are both arguing for preserving history but interpreting the word differently...

1 more reply

nativecoincOP3y ago

If you search through the commits on the Git project you’ll notice that they often reference previous commits in their commit messages. So yes: past history often comes in handy.

nativecoincOP3y ago

There’s another concern: repository bloat.[1] Ts'o does not want this link to be a mandatory part of the final commit which everyone needs to get on every pull.[2]

His own proposal does not necessitate keeping “history blobs” around: just use Git commit metadata (trailers) and leave the pointed-at data (beyond the cherry-pick backlinks) in an external store like Gerrit.

I think other commenters who suggested no-changes-to-core-git solutions might have mentioned git-notes, which is similar to the external store point since git-notes are completely optional refs (i.e. if you have notes on your commits in your tree then no one else needs to know about it; those who want the metadata can fetch it, those who don’t can save their bandwidth).

[1]:

> If the complaint about Gerrit is that it's not a core part of Git, the challenge is (a) how to carry the code review comments in the git repository, and (b) do so in a while that it doesn't bloat the core repository, since most of the time, you don't want or need to keep a local copy of all of the code review comments going back since the beginning of the project.

[2]: I’m guessing that people who want a “replaces” link would also want to make it optional. Keeping in line with the best-of-both-worlds mantra.

nativecoincOP3y ago

The OP[1] wants the best of git-merge and git-rebase: to be able to rewrite history as well as to have a new “replaces” pointer which points back to the commit before the rebase happened (basically).

> I've been calling this proposal `git replay` or `git replace` but I'd like to hear other suggestions for what to name it. It works like rebase except with one very important difference. Instead of orphaning the original commit, it keeps a pointer to it in the commit just like a `parent` entry but calls it `replaces` instead to distinguish it from regular history. In the resulting commit history, following `parent` pointers shows exactly the same history as if the commit had been rebased. Meanwhile, the history of iterating on the change itself is available by following `replaces` pointers. The new commit replaces the old one but keeps it around to record how the change evolved.

Ts'o thinks[1] that this is too simplistic, citing some workflows that he has experience with from working on Linux. He says that they use metadata in the form of key-value pairs in the commit messages in order to track cherry-picks across trees, how to test the commit, and even the fact that a commit has been dropped in the current tree.[3]

> My experience, from seeing these much more complex use cases --- starting with something as simple as the Linux Kernel Stable Kernel Series, and extending to something much more complex such as the workflow that is used to support a Google Kernel Rebase, is that using just a simple extra "Replaces" pointer in the commit header is not nearly expressive enough. And, if you make it a core part of the commit data structure, there are all sorts of compatibility headaches with older versions of git that wouldn't know about it. And if it then turns out it's not sufficient more the more complex workflows anyway, maybe adding a new "replace" pointer in the core git data structures isn't worth it. It might be that just keeping such things as trailers in the commit body might be the better way to go

[1] https://lore.kernel.org/git/CALiLy7pBvyqA+NjTZHOK9t0AFGYbwqw...

[2] Submission link

[3] How? By making an “empty commit” (I presume: a commit which doesn’t change the tree) and adding the “dropped” metadata to the commit message.

j / k navigate · click thread line to collapse

7 comments

6 comments · 2 top-level

gumby3y ago· 4 in thread

There are two competing needs to be considered when figuring out what your workflow should be in regard to history.

Curating the past is 99% wasted effort since looking back is rare. I think the best compromise is to add some automation if you really care, as Ted suggested.

woolion3y ago

>Curating the past is 99% wasted effort since looking back is rare.

Code-wise 'history-obliviousness' (or proper Git hygiene) is among the worst banes of programmers, I believe.

fulafel3y ago

I guess "curate" could be interpreted either way. It would seem possible you are both arguing for preserving history but interpreting the word differently...

1 more reply

nativecoincOP3y ago

If you search through the commits on the Git project you’ll notice that they often reference previous commits in their commit messages. So yes: past history often comes in handy.

nativecoincOP3y ago

There’s another concern: repository bloat.[1] Ts'o does not want this link to be a mandatory part of the final commit which everyone needs to get on every pull.[2]

[1]:

[2]: I’m guessing that people who want a “replaces” link would also want to make it optional. Keeping in line with the best-of-both-worlds mantra.

nativecoincOP3y ago

[1] https://lore.kernel.org/git/CALiLy7pBvyqA+NjTZHOK9t0AFGYbwqw...

[2] Submission link

[3] How? By making an “empty commit” (I presume: a commit which doesn’t change the tree) and adding the “dropped” metadata to the commit message.

j / k navigate · click thread line to collapse