At one extreme, you could keep track of all your keystrokes in the editor so that you could have a full history of your work including backspaces to correct typos.
On the other extreme is the mythical programmer who crafts perfect commits in exactly the correct order on the first attempt.
Most mortal programmers need the ability to iterate over their code to get it to a reasonable place before they want it enshrined in the blessed commit history that is shared with others or otherwise retained over time. The intermediary states, with false starts, poor implementations, hand crafted functions instead of standard functions, poorly named classes, and so on are all part of development but aren't particularly interesting to keep in the permanent history of a project.
Git's ability to manipulate the existing commit tree (amend, reset, rebase, etc.) is extremely useful for this normal 'exploratory' development. Once a stable point has been reached though (often because the tree has been published or shared with others), these commands do become inappropriate and a different set of tools becomes relevant (revert, merge, etc.).
1) Reviews are much more enjoyable when the commits reflect the final understanding of the problem rather than false starts etc.
2) Looking back through history is much more enjoyable when the commits reflect the final understanding of the problem rather than false starts etc.
I've always hated the common description of 'rebase' as 'rewriting history'. None of the existing commits are modified by rebase, new commits are added and the branch names are shuffled around.
Arguably it's more important to have clean commit history in open source projects. I used to be hard on this at work but I relaxed a bit lately.
Generally the development in companies moves much faster than in open source libraries (in terms of # of commits per weeks), and generally in open source world it's expected to have clean, well tested, working solutions rather than hacks that can be fixed tomorrow if needed - and because of that, OS maintainers have higher standards for commits.
I agree with you, that history should be left alone; mostly I think of the YAGNI argument that its futile to think that you have a better idea of what future developers want to see, compared to those future developers themselves.
My repo histories are riddled with stuff like "finished X", "stubbed out Y", "fix typo in X", but at least nothing has been hidden from future devs who might be digging around for their purposes, regardless of whatever elegant story I might come up with.
If you rapidly realize that you committed to the wrong branch, or left a line of code half-finished, or misspelled a word, there's very little value in logging that. If you had seen it two seconds before the commit you would have fixed it without a second thought, so why insist on preserving it seconds after the commit?
Presumably (if only for safety) no one is using 'undo' on anything pushed to a shared repo. I can appreciate the argument that we shouldn't rewrite history into a nice, streamlined narrative, but I don't see much reason to avoid tools like 'amend' for fixing commit messages, or 'revert' when some silly line of test code gets committed (and not pushed).
When I'm dealing with other people's Git histories, I appreciate the middle ground approach most. There's no point in spinning some imaginary, elegant story - if it's not real history then write up an essay instead of storing it in your 'history'. But I also don't need to see every line of "oops, un-stubbed X" - my experience is that at least for immediate fixes it only makes things harder to read.
At the very least, review your commits and clean then up before pushing - merge 'fix' commits if you didn't --amend them and review commit messages. What the commit does should be obvious from the subject.
I've also seen people write git commits as if they were a work log - things like "fixed a bug", or "implement feature X". That's the wrong way (IMO) to do git history, what the comment should say is what the /commit/ does, not what you did.
There is no I in programmng.
-- Added this thing -- Fixed typo -- Capitalized the letter
Etc.
Where I work, our commits from years ago are like that, and since practically all the people from that era have moved on, the history is practically useless when trying to determine what they were working on and why they were working on it.
In fact, I found what appeared to be a logical error in one of the many tools we have deployed. I tried to track down when it was added and the commit just said something like "fixing integration tests".
So, not only did they change some tests, but they also added some code as well.
In other words, the reason that line of code was added is very well hidden from this "future" (now present) dev.
Messages like "stuff", "it works", "xxx" and "everything I did last month" are not so good, but very common. Moreover if you are in the habit of avoiding them -- that is having each commit do one thing with a clear intension -- then you will keep finding times when you wish you had done something differently an hour ago. And then `rebase -i` is you friend.
And in case you want to commit to an open source project you are basically forced to rewrite history for any non-trivial change because changes that make sense to develop at the same time often form independent PRs.
Git history serves a few purposes. First, it provides an overview of development so that someone can use `git log` to quickly figure out what's been done. Second, it provides context to code changes so that someone using `git blame` can figure out why some code looks the way it does. Finally, it provides a set of distinct points for clean manipulation of history via `git revert`, `git bisect` etc.
From the point of those purposes, there's no real value in a history which accurately reflects the development process. The ideal commit has a few properties: it should address only one concern, it should contain all the immediate code changes addressing that concern, and it should not be overly long. Commits like that make navigating and manipulating the git history easy.
There's nothing wrong with keeping an accurate picture of history. It's just not actually useful.
Except for the relatively rare cases when it is incredibly useful. But that's why we have the reflog.
But it's equally valid to consider all your commits important in my history as well. It just depends on what you want. Personally, I never rebase, but I can understand why some people like that feature.
Many devs are happy with what you send them... as long if the history is right!
And right seems totally random to me.
Most are happy if you simply send them "one commit", but tell you to merge multiple commits before they accept it.
Others say "lets split this or that" before they accept it.
Then I have to go back and fiddle around with Git just to get my change landed...
Having 10 tiny commits like that are just failed attempts to fix a bug isn't practical. It makes reading and understanding your repository code _harder_. Git rebase helps me keep my log clean and understandable, thus making it something I can work with in the future.
I know some people are using merge commits or pull requests as a place to put this information - but maybe we need a an explicit mechanism for grouping together commits and summarizing them? I'm imagining something along the lines of code folding. Such a grouping might have other uses too (e.g. signal that there's a grouping of commits where the tests will fail, so skip to the last commit if bisecting)
What's the actual value in that intermediate commit? Other than seriously contrived scenarios I can't think of any of the "legitimate use cases" you mention. If it's someone else's code I never want to see that intermediate commit.
Where's the tradeoff?
But if you're working at home on your own project and just want to sync between a few machines, what do you do? Commit, "Hashing on the Liststore kinda works, some bugs." Push. Go to your home machine, work some more, and finally squash all those kinda working commits into one commit ... potentially even need a "git push --force" (which you can safely do since you're the only developer?)
I agree with you totally though. It shouldn't just be a branch. There should be a way to group x edits into one big commit. That's the atomic unit that has a specific feature, and all the mini-commits inside of it should be totally abstracted except for specific deep searching commands.
One of my greatest challenges in using/understanding Git was(/is still) the reflog. I know the reflog isn't that complicated but there isn't anything really analogous in other SCM (of my limited knowledge of perforce, svn, hg, git). Also for some reason the presentation of the reflog UI wise is intimidating.
Reflog is nice gem for git particularly since the builtin Mercurial rollback (I wish they would just remove that command) is fairly awful (use histedit or rebase instead). That being said Mercurials new changeset evolution experimental stuff looks really promising [1].
That being said if you are looking to undo in hg like this article talks about you have to look at the
hg unbundle backupfile
Unbundle is a pretty nasty command compared to the reflog commands but on the other it is just restoring from some backup file. I'm not too sure how you can transfer reflogs around.That being said, I've been using changeset evolution for over a year and it is awesome. Instead of creating a bundle your commits are just hidden. You can run any of your hg log commands with --hidden and it shows you those hidden commits. You can see exactly how your rebase removed (hid) some old commits and created new ones. It's very easy.
Mercurial 3.9 ships with the journal extension, which is a bit like the git reflog: https://www.mercurial-scm.org/wiki/JournalExtension
Do you commit after every 20 seconds of typing?
Why not?
The people cleaning up history have the same motivation.
Once you've published it, let it go. Do not mess with history. I think we can all agree on that.
The disagreement is whether I should squish my commits before I push it and I put that in the same pile of questions as am I obligated to my significant other and society at large to shave my legs before going out in public.
I think part of the problem is the distinction between history and audit. What you are thinking of is a full audit: every change made by everyone to get to and from each state.
History can sometimes be this, but sometimes you just want the solid states and the extra detail of each step between including failed steps that were back-tracked is more information than people want and can result in cognitive overload.
Different people want different detail.
Sometimes the same people want different detail for different tasks. One option might be for a feature to allow you to mark a commit as intermediate. Keep those with the flag around but don't display them or allow things like bisect to operate on it by default. Display them if an extra option is provided, allow action upon them similarly (not by default, you don't want a typo in a commit ID to result in a commit that exists but is not the one you are looking for to be accessed).
What more better than painstakingly accurate history is useful history. I don't care that Joe was distracted one day and had to make a fix-up commit. I care that he authored a certain change.
[1] https://twitter.com/michaelhenke/status/585142133167751169
What's wrong with organizing your source code in a single directory? Why do people insist on organizing them into subdirectories aud subsubdirectories instead of letting the main directory accurately reflect the size of the code base?
I was an hg person too, but I came over to the git side when I needed to collaborate with people. git's killer feature is merges. And merges benefit from fine-grained commits that rewrite history.
It's not enough to memorize command line "incantations", you have to understand what's happening.
Git is a sophisticated tool. There are over 100 subcommands!
Once you fully understand the basic terms (e.g., "detached", "HEAD", "branch", "commit", etc.) Git becomes less confusing.
http://www.verticalsysadmin.com/git/flyer.html describes a free webinar we offer on Git basics -- people who have used Git for years come away surprised how much they've learned.
You have a widely adopted tool with some real or perceived flaws. Everybody knows them and wants to fix them.
But unless you somehow get mass adoption from the start, the project flounders because everyone will be pointing out that you can't install & use the new tool in restricted environments or on very old environments.
So we're left with the lowest common denominator.
At this point, to break the cycle, either the original developers come with the 2.0 interface and push it hard (which might cause backlash: https://xkcd.com/1172/) or someone with a ton of pull and resources does it from outside (which could trigger a fork or other unpleasantness).
Why the heck wasn't the UI improved back then, before the thing was even released?
I mean, while your explanation is correct, it doesn't explain or excuse the pure incompetence of the original developers when it comes to usability issues. Their laziness or ignorance back then has confused and irritated thousands or millions of developers now, and continues to, and will continue to for the foreseeable future.
Make sure the shit you're going to set in stone is good before you grab the chisel, guys. You're professional software developers, not clowns.
Sorry for the rant.
Personally I prefer the CLI, it's the only tool that I can rely on to do what I tell it to do and to know what's happening. But it takes time and effort to get used to it.
In the same way one can argue whether you call a braeburn a fruit or an apple.
CLI is a subset of UI
The problem with Git (well, one of many many problems with Git) is that it conflates its user interface with machine interfaces-- which means tools that have to work with Git (like those GUI clients) have to use the CLI to do so. They don't have a more powerful option, like an officially-supported API or a shared library they could call into. This is terrible software design.
Chalk that up to the power of fashion and a misguided notion of technical proficiency.
Attempts at different UIs fail because they're all trying to put an abstraction over top of git that doesn't actually reflect the underlying data. As a result, they're limited to the set of git functionality that overlaps their abstraction, and the tools are less powerful.
https://stevebennett.me/2012/02/24/10-things-i-hate-about-gi...
> Once you understand Git's data model, the UI is perfectly intuitive
So it's not intuitive at all.
Not to mention that every damn command is inconsistent with every other command! To remove a file, git rm. To remove a branch, git branch -D. To remove a commit, git reset --hard HEAD^. How is this intuitive, consistent, or even sane?
"I understand git" and "git is easily understandable" are completely different. Git is not easily understandable, at all.
Ideally, with all of the -p commands, git wouldn't actually apply any of the changes I specified until it was about to quit (i.e. either when I advance past the end of the set of potentially-affected hunks, or I manually type 'q'), and then would prompt me for whether {set of operations I specified} is what I wanted to do. This would leave the -p operations the the flexibility to expose an 'u'ndo.
[0]: https://github.com/airblade/vim-gitgutter#getting-started
http://bryan-murdock.blogspot.com/2013/06/git-branches-are-n...
For example, I consider that CSV and Subversion don't have branches, but just "copies". To my mind, what git branch does is exactly what branching is.
cancel = reset --soft HEAD^
I don't want an alias to hard reset, it seems to dangerous and a good way to lose some work. However a soft reset like this allow me to cancel the last commit and add an omitted file, or remove one from the commit, or simply to correct the commit message easily. git add <file>
# or "git rm --cached <file>" to remove
git commit --amend
and it will replace with a new commit that has what you want. It's like a mini rebase -iThe only non intuitive thing might be, that calling e.g. 'git undo' twice doesn't undo the last two changes, but the first undos the last change and the second one undos the undo.
~/git-undo.awk:
BEGIN { jmp = 0 }
{
match($2, "{([0-9]+)}", c);
if (c[1] == jmp)
{
jmp++;
if ($3 == "reset:")
{
match($6, "{([0-9]+)}", x);
jmp += x[1];
}
else
i--;
if (i == 0)
{
print jmp;
exit;
}
}
}
git config --global alias.undo '!f() { git reset --hard $(git rev-parse --abbrev-ref HEAD)@{$(git reflog | awk -v i=${1-1} -f ~/git-undo.awk)}; }; f'
Redo is also possible, but i don't have time now to do it.You need commits, you need the ability to merge. If you don't want to force all commits to happen online and everyone to resolve conflicts immediately then you need branches. You want tags if you're going to have releases (otherwise how do you refer to them?). At that point you basically have git.
All the complicated features were added because someone thought they needed them (there are certainly a few git features where I think that someone was wrong, but not many).
It quickly falls apart though as you don't have a proper history, reverts, or branching - it's OK for static sites but a disaster for anything more than that.
If you want to know how many steps back you need to undo, you still need to check reflog. This means you're better of just resetting manually to the change you want.
2. merging two or more different branches
3. transporting code (publishing it)
3a. accepting somebody else's patches
4. descriptions of history points
4a. pointers to parent code trees (especially with tree merges)
5. history traversals (bisecting, among the others)
Not to mention that you need either administrative privileges for creating a snapshot or a special kind of filesystem that supports this for non-administrator. And sysadmin still needs to prepare such a filesystem for your $HOME.
As far as administration, most devs will have elevated rights or can use FUSE or something. We could talk about needing administrative privileges to install git, too, but it's pretty far removed from the central topic.