Kudos to GitHub for providing this feature that a lot of people have asked for. I obviously don't plan to use it, but I appreciate that it's an option for those people that like their small, harmless lies. ;)
For example: I check out a repository, and create a local feature branch. I create a commit containing the tests for the new feature, then one for the first draft of the new feature, then two or three for bugfixes. Each commit is small, and self-contained, but importantly isn't standalone. If someone checked out the repository in the middle of my chain of commits, they wouldn't have a working product. Then I upload my change for code review. There's no point in reviewing each of my ~5 commits individually: they only make sense to the reviewer as a combined unit. And there's no point in landing them individually: they only make sense for the overall project history as a combined unit.
In a project with many developers (e.g. 1,000 like the Chromium project), every developer has different local practices. Some keep their work based on HEAD of master via rebase, others via merges. Some do test-driven development, some don't. Making the code review the atomic unit of work, rather than the messy string of local commits, helps the project enforce common etiquette, commit formatting, and readable history.
We only allowed squash commits on master because of what you're describing. That is the level where history "made sense". However, for code review, we wanted to support both styles, because there is an advantage sometimes to seeing the sausage being made. For instance someone will refactor something -- maybe change a method name. Then they apply that refactoring at all the call sites. Very conscientious developers would break this into two commits. We didn't want the first commit on master, but it made sense to review this way, because it was easier on the reviewers: change, effect of change on everything else.
I call this "telling a story" with your commits. There's a lot of value in that style if you have the time to do it.
The other style of commit-by-commit reviewing, where I see all of the work in progress commits, I don't find valuable at all and I _definitely_ don't want to see on master.
If that were true, the optimal solution is PRs with individual commits that all pass testing. I find it much easier to review a series of small changes for logical correctness than mashing them together into a single PR. Github recently added this as a feature, so I'm not in a completely invisible minority there.
And then, when the review is over, having discreet commits makes git bisecting down to the commit that broke the system more granular.
I agree that code reviews should be important, and key to understanding the history of a project at a more useful timescale. I just disagree that they should be "atomic" and that a reviewer (even, or especially, a later code archeologist) may not have reason to inspect or dive into smaller units within code reviews.
Where I think that we may agree is that I feel that even if they shouldn't necessarily be "atomic", I agree that code reviews should probably be first-class objects when talking about and dealing with source control. In git, you can use --no-ff merges today as a useful approximation of code review boundaries (especially with PRs and GitHub's default --no-ff and including linking PR #s). It might be nice to see code reviews or other aggregates of commits/commit graphs be truly first-class citizens of git in some manner.
People who review code for inclusion in a project, want to track meta-progress on issues, want to pin versions for release, etc., mutable history means they can squash fixups or fix your whitespace for you, rebase changesets onto other changesets, have history that reflects the project management strategy, etc.
...tidy commits are aberrations and full of little lies...
...small, harmless lies.
Interesting choice of words.
Here's another way to think about squashing private commits for public consumption: programmers do not install keyloggers and upload their entire keystroke history including every Backspace and Ctrl+Z used in their text editor to the repositories. And, most of us wouldn't care about seeing it.
Whether John typed "x = 218^H^H73" or "x = 273" is a meaningless distinction and irrelevant noise. Those spurious ^H Backspaces are equivalent to the twitchy multiple commits in private branches. We really don't want to see them. Think of private noisy commits as an extended workspace of a text editor. If squashing those commits is a lie, the Backspace key without an audited keystroke log is also a lie.
Side note: The other comment downthread about keeping all private commit history for "git bisect" is a red herring. Sometimes a commit will deliberately have broken syntax -- e.g. make a quick commit before getting up to grab a soda -- it won't be CI test worthy. Besides, an automated CI server's cpu cycles can point to an upstream integration/test/qa branch instead of a programmer's private branch.
In a world with infinite storage space and a good UX on top of it, I could absolutely see a case where it might be amazing to have a source control integration with the full undo stack of my editors. VCR roll through someone's efforts Twitch style and grab a box of popcorn as you drinking game your way through their typos...
That said, I definitely will rebase/squash local WIP stuff on local-only branches on my own machine, as I see fit. Yeah, I see those as harmless lies because I really didn't build it that way, but sometimes that's what makes me feel better about publishing that work.
I appreciate you trying to push this conversation towards it's extreme, absurd ends, but I also realize that there are a lot of aesthetic judgments here and I for one lean towards keeping more of the little pieces and the interesting digressions like here's where I totally "brb grabbing a soda" the whole branch and sometimes you taking that break means easier commits for me to review when I'm reviewing your code (whether a code review in a PR immediately, or a research effort down the line) as maybe I need a review break there too. I appreciate not everyone feels the same on this topic.
It's also extraordinarily cluttered, and it really gets in the way when someone later want to do a 'git bisect' to track down when a bug was introduced.
When I'm working I do frequent little commits just to capture and back up my broken stream-of-thought experiments. None of those are going to be relevant to people who work with this code in the future; how does it benefit them to impose my haphazard process on them?
Of course if there's a way to break up my final commit into more meaningful smaller commits, I do that rather than one monolithic commit.
For example if I clean up some whitespace issues, add some new comments to old code, and implement a new feature, I'll put those in three separate commits even if I originally did the entire change at once. In this case you'll see more commits in the public history than I originally had.
Or if I check in a new version of some external library, add an API call that uses it along with its tests, and add UI code that calls the API, those may be separate commits in that order.
My goal is to make the public history useful to future developers.
I did a rebase once because there was a big mess I was trying to clean up in order to make a merge work. I would be surprised if anyone has ever cared about the details.
(though the counterpoint is that maybe there are parts of the granular history that are just busted for other reasons)
In particular, I would kill for support for basically the `git log --first-parent` option in viewing the history of a branch on tools like github and gitlab. Rather than squashing your branch, you make your merge commit have a meaningful commit message (which you do anyways for a squashed commit) so that you don't always have to be viewing the tangled web underneath.
There should be no practical difference between a squashed commit and a merge commit from the perspective of the branch that commit is on (they represent the exact same change from parent to child), but the tooling insists on giving you the most complicated possible view all the time so there is a tangible difference.
In terms of feature branches:
The individual engineer is free to do individual commits in their branch as they need to in order to keep track of their work. Before they submit a pull request, they should rebase and squash all of their commits into a single one that thoroughly describes everything in the feature that is being committed. When used in conjunction with tools like Phabricator, Arcanist, and commit templates, the workflow is very smooth.
When another team member goes to code review their pull request, rather than having to examine multiple individual commits there is only a single one to examine and comment on.
Master history:
Rather than cluttering up the mainline history with 'Did this', 'Did that', 'Merged: Did this', 'Merged: Did that', 'Reverted: Merged: Did this' etc, you get a series of commits that articulately describe what each commit was for. In the event you need to revert a feature because it breaks something, its much easier to revert that single commit than trying to hunt through all of the individual commits from an engineers feature branch. And in that case, if you revert one of the commits from the feature branch it could break something else.
A better approach would be to create multiple small commits that work and are self contained. It's ok for commit N to depend on the preceding commit, but each N should be able to stand on its own.
If devs rebase everything before pushing and push often (therefore also rebasing often), conflicts will happen a lot less often. Devs can also use their private branches for temporarily saving all WIP, squashing/rewording only what makes before submitting the PR or pushing to master.
Anyway, to each their own, and I appreciate your preferences differ from mine.
I can "experientially" state that squash throws away very necessary information for anyone trying to make sense of old code.
Here is a (made up), but generally realistic git log
git log | grep -i WIP
mon 5pm - WIP, going to work on this from home
tue 4:45pm - WIP, going to work on this from home
wed 2:30pm - WIP, meeting
wed 5pm - WIP
thu Noon - WIP, working from the cafe on my laptop
fri 5pm - WIP, working from home
sat 3pm - WIP, heading home sick for the day
Does it really matter to anyone, and count as anything but noise to know that I committed my work in to the repository just so that I could work on it from a different computer. I can't imagine how low the signal to noise ratio would be if every person on the team did this.Maybe not information that I care to do much more than skim, of course, unless I'm your manager looking for reasons why you might be working from home too much. :) (That said, there's probably some cool deep learning applications here...)
Some of the information, for instance, is that maybe you are working on pieces too large at a time and should find more ways to break them into smaller units of work that you can more easily commit in logical piece at a time rather than "snapshot dumps" between computers.
Like I said, from a hyberbolic standpoint, how the sausage is made isn't pretty and is full of garbage sometimes, but it is informative.
Ideally, every commit that I'm making should be preferably not big, but logically complete and working. The problem is that sometimes I want to work on another branch and I have to commit in the middle of the work so I can checkout into the other branch in which case (without a squashing merge) the "offending" WIP commit would end up in the master branch's history.
I would hate to have done a bisect to land on your commit "WIP, heading home sick for the day" as the one that caused the bug. By all means, create a WIP branch and if you can then push this WIP branch to the main repo, but please use commit squashing into logical units of change when merging into the feature branch or main trunk!
But it's a lot easier to convince devs that using tools with bad UI elements is hardcore and makes them look smart instead of demanding improvements.
Certainly there are a lot of people that seem prefer imperative mutability, and more power to them, but maybe we learn from all of this and build better tools too.
I agree that code once committed to master should be immutable, but your own private commits should be maleable. It's not a matter of feeling superior or trying to look smarter, it's just really commonsense that you should try to submit easy to understand code changes and remove as much unnecessary extraneous rubbish as possible.
Imagine working on a fast-changing block of code with 20 or so other people concurrently. If all 20 of those peoples have patches submitted to master for review, and all of those patches have multiple commits each, then every time one gets merged, the others will have to rebase to its changes and fix conflicts for _every single commit_ while the fast forward plays out. It's a horrible experience. It prevents desirable code from getting in as contributors drop out due to the browbeating.
Squashing commits can be the difference between tediously fixing something once vs. tediously fixing it 20 times. No one needs to know that you changed your mind about calling that struct "ConfigOpts" before it was ever introduced upstream.
I'm mostly sort of advocating a "rebase none of the things" approach. Fix conflicts only when they happen in a branch (the GitHub PR system very nicely doesn't let you merge branches that conflict with your target branch and with CI information even better it won't let you merge branches that don't build). It's really not a bad experience.
I would probably be much happier not squashing stuff if you were able to bisect cleanly to points between branch merges. I don't think that's even theoretically possible; which means that when you're bisecting it's quite possible to pick up half-baked mid-branch points that you have to recognise for the broken rubbish they are - lots of false positives there occasionally, and makes bisect a lot less useful. On a repo with nice squashed commits, you tend to be able to narrow down to the feature very quickly - of course those commits then tend to be bigger etc., but I find that less of an issue.
Also, what if there was a tool on top of bisect that could better utilize GitHub PR JSON to target the search pattern? That could even save you some time in the case where you already have CI information attached to your PRs...
What's truly meaningful IMO is a git log that reads like a product change log
I have a self-congratulating black belt in source code archeology. With the right tools, most of which are on GitHub, even, such as good commit range diffing, smart uses of tags and branches, and knowing how to navigate the DAG from merge commits (more reason to -no-ff) you have a lot of power in your hands.
«What's truly meaningful IMO is a git log that reads like a product change log»
I appreciate that point of view, but I don't share it. A product change log, I feel, is a bit of marketing/PR that needs some time, love, and editing; I find a git log is for catching snapshots of raw progress and more often useful in seeing what your co-developers are up to, as they are working.
Even commits that lack good commit messages provide valuable information in the form of insight into the author's cumulative thinking/process.
I'm an engineer, and I see the commit history as a tool. When I want to know what a block of code is for and why it was written the way it was, the commit history (if it is clean and granular) will tell me a lot about that, and will point me to authors, issues, features, and requirements where I can learn more. I don't care about the process of producing the code, I care about the end result. I get enough exposure to the process when I'm writing my own code.
I've had it put to me that an Architect deals solely with the art of a project and a Scientist deals solely with the science and theory; it's the work of an Engineer to deal in the practical middle where art meets science (meets the real world).
Sometimes it is easy to overlook (or to want to overlook) the little bits of humanity in the machine; the various sorts of creative chaos in the vast ordered systems; the parts of the code that are art.
There's no easy answers to much of this thread, because it is art, it is aesthetics. There's no "right" answer, just "this looks good and pleasing to me and my team" and working to find that practical Engineering border space between the unwavering art of the Architect and the similarly unwavering logic and discipline of the Scientist.
I'll give you another analogy...If I'm composing a song, and you're not particularly trying to learn how to write songs, what is more relevant to you, the end result or the various drafts and early versions that made it to that end result? For the majority of people who aren't interested in learning the mechanics of songwriting, the "journey" is definitely not as interesting as the destination. For nerds like you and I, that's part of the fun! So I try to keep the original intent of my commits, and preserve their messages in a bullet-pointed list format, to show the individual changes that were made in addition to a higher-level overview of the overall change to the project.
TLDR: Project-level changes are not the same as individual changes, and while both should be represented in commit messages, the project-level changes are overwhelmingly more useful in the future. Git is not about code storage, it's about code communication. It's about developers on the same team communicating with both prose and code in tandem.
Multi-parent commits complicate the use of git. When a commit has a single parent, we can pretend that it's a delta: a patch. (Like it fscking should be in a decent version control system based on some sort of patch theory!) When we "show -p" that commit, we get a diff, which is against its one and only parent. Multiple parents also complicate certain situations. They rear their ugly heads and create an ambiguity. For instance, consider git cherry-pick. If a given commit has just one parent, we can pretend that it's a delta and "git cherry-pick" it. If it has multiple parents, the ugly truth is revealed: a git version isn't actually a delta. If you want that change, you need to specify the parent!
The parent of a commit should be the thing that the work was actually based on: the work that the developer took and massaged to create the new baseline representing the commit. When you have multiple parents, only one of the parents actually meets this definition. The others are arbitrary nodes in the system which are just installed as the parents.
Also, Linus endorses cleaning up your WIP commits before pushing.
Often there will be counter-intuitive bits of code that make sense in "git blame" if you see the original, small, commit that they were created as part of. If they're part of a 1000+ line feature bomb, you lose that important context.
Whilst my small commits were squashed, it shows that even well formed smaller commits that are part of a larger change can often be problematic.
Keeping your commit history clean is important. When I'm bisecting, I don't want to see coding errors like typos and syntax errors, they literally get in the way of the bisect. And when I'm reading through a source file, I'd like to each commit to be significant, or at least entirely relevant to a change. Minor syntax error changes, whilst they can still sneak into the master repository, should be few and far between.
Basically, it also encourages unit testing, rechecking your code, continuous integration, and a raft of other good and best practices around coding. And your colleagues will thank you.
Merge squashing is actually a pretty decent way around this - I'm going to use it as my workflow now. I'll make frequently code commits on a seperate branch, then squash down into another branch, then push this.
It's weird. Git was made for something very specific, Linux kernel development. It made a lot of decisions to support that environment. However most of us don't work in that type of environment.
If you have a private repo for your job you're in a very different environment. At my job mutating history is the opposite of what I want. I don't want people mutating history. Ever!
Part of the problem, in my opinion, is that Git encourages tiny commits. I might take it a step further and say that Git mandates tiny commits. Too tiny in my opinion. When that's forced upon you a clean mechanism is required. But I'm not sure it isn't sweeping another problem under the rug.
But I'm weird. I'm a game developer. We all use Perforce. It just works. You can't fuck it up. You can't ruin history. You can't get stuck. Artists and designers can be trained to use it from scratch in 5 minutes. It's so easy to use there aren't tens of thousands of blog posts desperately trying to explain how easy it is use.
To me it seems like a big dump of files like a network mount with locking and some kind of history. But how the hell is one supposed to write software with it?
It has complicated tools for sharing incomplete work. I don't know how you do code reviews but we have a Perl script for that(!).
In other words: if you think p4 is simple and git is not, it's because of your background.
Unless someone checks out everything in the depot by accident..
There is not technical reason a source control tool could not offer multiple 'views' into the commit history. A high level linear view which you can zoom into to see the underlying commits and merges. Why do I have to lose the latter to see the former?
(A) how important the commit is and (B) how many lines of code are changed.
Squash makes sense usually when you have have a branch with multiple small commits that affect the same thing. Doesn't make sense to squash 2 commits affecting large parts of the application just because your commit has to be squashed before merging.
Other than that, maybe all of this is an indication of a need for a meta-UX over the change graph to annotate and describe subgraphs in new ways.
If you want to dig into the actual commit, and the dirty truth under the surface - use Gerrit.. you can even use git itself to pull down the truth.
I'm expecting this github feature to be the same, the pull request probably keeps the truth somehow.
I don't have all the answers of what the tooling should be, I just think this is as a good an opportunity to discuss it as any.
GitHub's long-standing --no-ff merges at least are one way of preserving the code review and it's internal changes directly into the git DAG. This mostly works except for tools like git bisect that treat the DAG as if it were a straight line, rather than making use of the fact that the system already supports complicated graphs.
Furthermore, along the questions of why use two tools to navigate the code repository: I sort of wish that things like GitHub PR comments and code annotations made their way somehow into nodes in the git DAG.
The easiest time to catch a bug is when it's hiding in a 10 line diff, especially before you commit it.
It gets progressively more difficult as the scope of the changeset grows.
It looks like it comes down to the style of the dev/team, because I NEVER do any kind of WIP commit.
Err, it doesn't make it look like the "lie of a" straight line, it makes it into a straight line. Whatever the other developers do, I move the project forward one ball of functionality at a time when their changes are useful to mainline.
When you use software, why do you run a "release" instead of whatever happens to be in the dev's directory when they leave for lunch? Don't you feel dishonest getting the version without the bugs?
> Seeing how the sausage was actually made (no rebases, no squashes, sometimes not even fast-forwards) isn't pretty, but it is meaningful
Perhaps if I was going to hire you it'd be interesting to glance at how you work with nobody looking. Do you keep your desk tidy or not?
But it's absolutely irrelevant to the final project and as such, it shouldn't be stored.
> I trust that. It's real and visceral and how software is actually made
You should read Tracy Kidder's _The Soul of a New Machine_, it's a good read about sausage.
But it's not how you should work because you have choice now.
I'd like to think, however, and I think that this is my larger point in this thread, that maybe we could build better storytelling tools that don't delete/mangle/mutate the actual history so that we can see in the same repository both the story and the raw facts.
Using fast-forward (and possibly only allowing fast-forward) is a good idea. Squashing entire pull requests that may change multiple things into a single commit is a very bad idea.
However, I frequently see people adding more commits on top of a pull request to fix typos, or do incremental development, where only the final result builds and passes, but not the intermediate stages, and where the changes are scattered among the commits with no logical grouping. In that case, I'd rather see them squashed and merged than merged in their existing form, and having a button to do that makes it more likely to happen.
I've seen projects where maintainers clean up poor commits before merging them: rebase/squash/reword only what's appropriate.
That's a pretty good reason not to squash till the review is done.
I think the argument can be made that if you don't feel comfortable performing a squashed merge of a PR, then that PR contains too much work and should be split up. However, I don't think there's an easy rule to decide in either case.
Let's say we're adding an interface/typeclass/protocol and a concrete implementation. I'd say these should be two separate commits, as they're adding two different things. An interface doesn't require a provided implementation to work. But, if we were to create those as two separate pull requests, it would be more work for the project maintainers, and the initiator wouldn't be able to create the PR for the concrete implementation until the interface PR was merged - the concrete PR can't be added as a dependent PR of the interface one, or something to that effect.
Since you can "compare" almost anything on Github, small commits aren't really an issue, just view a larger-scope comparison to get an idea of the whole PR.
Another way to put this might be that commits are for individual code changes that build up to a pull request, which is a conceptual change?
See, and maybe this is because I'm just dumb or something, but I have never gotten rebasing to work for me. Ever. Every single time I do it I read at east 3 articles about it so I don't screw something up, I attempt to do it and ultimately I lose a bunch of work.
I just don't get it. I can write web, mobile and desktop apps and I like to think I'm pretty good at it. But I'm one of those people who constantly have commits of merges in their code because for whatever reason I just can't get my head around making rebasing work correctly.
Am I the only one? Sorry for the derail but it's bothering me that I've never gotten this to work correctly and I feel otherwise normally smart. ¯\_(ツ)_/¯
1. Always use the "upstream" branch as your rebase target - "git rebase -i master", or " git rebase -i origin/master". This is almost always what you want, and picking the wrong base is the most common error I've seen when teaching people rebase -i
2. Use autosquash! https://robots.thoughtbot.com/autosquashing-git-commits. If you have trouble with the text-editor interface you get when you run rebase -i, this will both handle its usage, and in the long run give you some visual examples of how the interface is supposed to be used. If you're really into this, set the config option "rebase.autoSquash true" to avoid the extra command-line flag.
3. If you mess up and realize in the middle, git rebase --abort.
4. Use the reflog after the fact for both finding and undoing mistakes: git diff branchname branchname@{1} to check for unintended code differences, and git reset --hard branchname@{1} to undo the rebase.
$ git checkout master
$ git pull
$ git checkout branch-name
$ git rebase master
If there are merge conflicts, open the affected file(s) and resolve them. Then: $ git add filename.ext
$ git rebase --continue
Finally: $ git push origin branch-name
If you've already pushed the branch, use -f. Make sure to always specify the branch name when using that flag!If you have a branch and it's already pushed, rebasing just feels kind of funny and can sometimes cause a lot of problems if anyone else has checked it out.
If you have a branch and it's local only, then merging from mainline into your branch and selecting rebase instead of merge is relatively painless.
One trick that's worked ok for me in a private repo is, before starting to edit the fix-spline-reticulation branch (which has a handful of separate logical changes, fixes discovered midway through a later change that really belong in an earlier change, and temporary debug code that was never meant to go into the product) for publication, to do
git branch fix-spline-reticulation.0
(or .the-next-sequential-number). Then no matter how badly the "rebase -i master" goes, there's a branch tag pointing at the original state, and git branch -D fix-spline-reticulation
git checkout fix-spline-reticulation.0
git branch fix-spline-reticulation
will destroy the failed attempt and restore the branch to its earlier state. (Note that if you decide in the middle of the rebase that you're losing, "git rebase --abort" will undo anything you've done so far; you need the backup only if you regret the rebase after you're finished). It also makes it easy to "git diff my-feature.0..my-feature" and confirm that all the changes in the edited history add up to the same as the real history.Sometimes I do this in the middle of development to move all the changes intended for the product ahead of the temp debug stuff in case I suspect the debug code is causing problems. Keeping the debug code in the dev branch even after the cleanup rebase makes the diff to check the rebase easier (then, of course, the merge should take the commit just before the debug).
Best never to do let anything but the cleaned-up branch hit a shared repo.
Rebase takes a little bit of practice, but everyone who's using git owes it to themselves to learn it by heart. It's almost like having superpowers compared to any VCS which doesn't have rebase.
My advice[1] would be to simply create some dummy repository (perhaps just copy an existing repository with some real code) and going through various scenarios described in the git-rebase man page (using some trivial changes). If something blows up, don't worry, you can always just start from scratch.
The key to making rebase work for you is: 1) understanding the underlying model of git[2], and 2) practice, practice, practice. With enough practice you'll get a good feeling for which "type" of rebase works best in a given situation.
[1] In addition to the excellent advice given by others in this thread.
[2] It may look like it's really all based on snapshots of files, but the workflows are definitely mostly centered around patch-based thinking.
ultimately I lose a bunch of work.
Take a copy of the entire repository before attempting anything potentially destructive.This is GitHub's attempt to solve the problem without really changing anything. It won't really change anything. Since pull requests routinely contain a mixture of both changes that should be squashed (fixups) and changes that should not be squashed (independent changes), this just means that you get to pick your poison.
When coworkers create Pull requests I don't go through all of their commits and changes along the way. I just look at the diff so, I don't see the need for them to squash it first.
Take a look at this PR for example:
https://github.com/HearthSim/python-unitypack/pull/4
Lots of back & forth. All the commits are related, and the PR is there to land all of those commits at once. I could land some of them right now (as they're safe to land), but keeping them in the PR keeps everything related in the same place (and none of them are required until that last commit lands).
A PR mirrors a "patchset" on mailing lists. You don't always want to squash all of it.
What you do want to avoid is a situation like this:
https://github.com/jleclanche/django-push-notifications/pull...
Where the original author creates their original commit and doesn't know how to --amend + push --force to the PR, and you end up with a ton of commits which you don't want to land all at once.
$ cat ~/git/ATLAS/.arcconfig { "project_id" : "ATLAS", "repository.callsign" : "ATLAS", "conduit_uri" : "https://phabricator.$MYCOMPANY/", "arc.land.onto.default" : "develop", "immutable_history" : true }
Some people might do similar things but they might not assure each commit is green, and they never squash anything (so you end up with non-meaningful commits).
As @3JPLW said, I see when it can be useful for opensource maintainers to have the option to squash someone's commits, when the change is small, but there are many commits (due to a review ping-pong etc)
1. branch off master
2. work, commit, push, test (on CI server)
3. decide it's time to ship
4. rebase -i, push, test (again)
5. git checkout master && git merge --no-ff feature_branch
(make the merge commit message a summary of the feature)master ends up being a list of feature branch commits, bookended by the merge commit which introduced the feature. Getting the squash commit diff is as easy as 'git diff feature_branch_merge^..feature_branch_merge'.
Using fast-forward without squash is also a bad idea in many cases: the string of commits may contain multiple points that don't actually build or pass tests, even if the final commit in the chain fixes all that. There's no point in landing those broken commits, and doing so will confuse bisection tools.
Fast-forward with squash, and enforcing reasonably sized code reviews as a matter of culture, is the best of all worlds in my opinion.
If changes are too large/complex/disjoint to fix in a single commit then why have them in one PR?
This unfairly places the blame for Git's utterly shitty UX on the part of the users. When you have thousands of users who struggle to use a tool correctly, it's the tool's fault, not theirs.
I've been using Git for years, work professionally full time on an open source project that lives on GitHub, maintain several open source projects with a number of committers and generally live and breath Git and GitHub all day.
I still fucking hate rebasing and get tripped up by it on the few times I end up having to deal with it.
When I had no understanding of what was going on, I didn't like it either. Now that I use it frequently, I understand it better, so I don't hate it anymore.
I like running `git rebase <main-branch>`, where <main-branch> is typically master, in my-new-thing branch because it lets me deal with any conflicts from upstream one by one.
I also like running `git rebase -i` in my-new-thing when I have a bunch of commits with redundant messages that I want squashed into a single commit before I push the changes. Basically anything that requires messing around with a range of commits is a good use case for `git rebase -i`.
Why do you hate it so much? There's really not much going on that you should have to hate. To me it's like a bunch of small, compartmentalized merges.
Being subjected to Git at work makes me long for the days when I used Lotus Notes for email. Sure Lotus Notes is a usability disaster, but at least you get the sense they were trying to make things work. And, to be fair, it got slightly better each version. Using Git makes you feel like it was developed by people who simply hate you and want to see you fail.
Give us tools to mark commits as unimportant or group them together as a meta-commit object for history purposes.
However, I think this is something that will have to be built into git itself, not by Github.
If somebody suggested keeping data that showed every step of a person writing a report, they'd be laughed out of the room.
It's gray area that varies significantly by feature complexity and team dynamic. I find it frustrating when someone merges a branch with 1 real commit and 3 more one-line commits or trivial fixes that should have been in the first.
Whilst it's clear that retaining the pushed history can be useful in some cases, I don't understand the notion that disallowing history rewrites helps retain useful data in general. Developers can make commits for all kinds of arbitary reasons e.g. they reached the end of the day, or had to switch branches to work on a different patch. That doesn't seem like a particularly useful thing to record. To take it to an extreme, I wonder if any of the people who think that the precise commit history gives useful information about how a feature was developed have configured their editor to commit on keystroke, since that seems like the logical conclusion of that position (and indeed is effectively what tools like etherpad do). I suspect not because, actually, being selective about the information that you keep is rather helpful. During rebase we see the developer as curator, selecting the most useful representation of a set of changes for the benefit of future readers. or, to use your analogy, if someone suggested removing meaningless data from a report to focus attention on the most important points, then they would certainly not be laughed at, but praised.
Of course GitHub's implementation is too blunt a tool to be really useful, but hopefully we will eventually get something better.
1: Completely ruin your ability to git bisect any bug injected in your branch. Instead of getting a 10 line commit, bisect will point you to hundreds or thousands of lines instead.
2: All code will blame to a single person. Code with 6 people on a large branch? Want to git blame the code to see who wrote the function that is weird looking so you can ask questions? Too bad.
Do not squash branches on teams. One of the biggest mistakes of my professional career.
point 2: As people mentioned, this is for PRs done by individual people, usually squashing their local history. If multiple people were working on the branch, they should have been PR-ing against that branch (squashing the commits), then you merge that branch to master (you may rebase it, but not do significantly destructive stuff).
Even then, these are features to use with critical thinking. If your PR is really large, massage the history to be relevant and meaningful for git bisect purpose. Use full squashing when it was a local work full of "My hands are typing words" commits of no relevance.
I swear, we're in an industry full of people making 6 figure salaries, who treat their job as if it was a call center script to follow. If you're an engineer, use your head and engineer solutions around the tools you have.
Squashing is a tool to use with discretion. You squash two or three 5 line wip commits together. And I'd never squash changes from two authors (except perhaps whitespace or comment fixes).
* https://github.com/blog/1815-l-is-for-labels
* https://github.com/blog/1451-branch-and-tag-labels-for-commi...
* https://github.com/blog/626-announcing-svn-support
[0]: https://twitter.com/gjtorikian/status/715972348860633088
I'd love to see a commit linter, that points at commits with text like "oops" and "fix my derp" to suggest possible commits to squash.
Git history shouldn't resemble a hot mess, but the evolution of code should be pretty granular. I'd take the hot mess over full squashing, though.
Also, it's not clear if it is possible to disable the merge button completely. I prefer to use the command line to rebase and fix the details in the commits, but the big green "merge" button is always too tempting and it's easy to press it by mistake.
On both you even cannot see when the branch was started, and you can merge fine as long as there were no merge issues. Then you run gitk and it looks like a spaghetti-horror, with trivial branches being started 200 commits ago.
I don't necessarily need a "rebase&merge" button but at least info about the shared ancestor with master.
I typically did it through git rebase -i HEAD~N, so maybe someone here on HN knows of a better way to squash a commit whenever you're updating remote history. Albeit, it seems that updating remote history with a squashed commit isn't entirely attractive behavior and that's why I was forced to force push.
Instead of git rebase -i HEAD~N, I typically use git reset --soft HEAD~N followed by git commit.
Instead of git push --force, use git push --force-with-lease. This only updates the remote if the remote's current state is what you expect. For example, it will fail if someone else has pushed to the same branch in the meantime.
But yeah, force push is kind of inherent to the process - you're rewriting history, no two ways about it. Usually for my own forks of projects, though, I'll go into .git/config and add a 'push' option with a + in front of the refspec - this enables force pushing always. This really only works and is safe for workflows where there's a distinction between your own personal GitHub account and the upstream that has the authoritative copy - you really don't want to rewrite history by accident on the latter.
CI will still run against the hypothetical merge commit, no? I wonder if there are edge cases where merge vs squash+fast-forward would result in different conflict resolutions and different trees, so master could end up with a tree that didn't have tests run against it.
Isn't that already the norm? I.e., most projects run CI on an unmerged PR, then merge to master, then run CI on master to see if the merged tree is actually good. Are there projects which test a PR's post-merge tree before updating master? That would be nice, but I don't know of any tools which support that workflow, and it wouldn't scale forever since it would involve a serial merge queue.
I use 'git add -p' judiciously and only commit when having reached a point where something can be usefully said to be in some way "done". Sure, it's not perfect, and occasionally I end up having to do some cleanup of miscellaneous printf statements, debug values or typoes in subsequent commits, but this is something that should really be avoided if possible.
Besides the obvious 'hard drive failure' or 'laptop stolen' situations, there are also more frequent situations where 'oh shit, I went down a totally wrong path there - let me back up a bit and try that again.' Git commits are little save points that let me do small experimentation and go back if I need to.
[1]: http://feedback.gitlab.com/forums/176466-deprecated-feedback...
I can understand a desire to filter out stages of a project by different levels of review (I just ran the build and tests passed so I committed vs a bunch of people reviewed it so I merged), but that people solve that by deleting or rewriting history to be something different from what actually happened is just nuts.
Is it just that git makes it easier to change history than to add metadata for filtering? Why have we not seen presentation tools to solve this problem rather than what seems like a ubiquitous readiness to alter and throw away the messy but accurate facts about what happened?
It's a lot easier when an entire changeset has a single checkin into master, because then when you're doing your bisection there are a far few changes to bisect.
Once that is satisfied, commits should be as small as possible, so that information about the grouping of changes is preserved.
git checkout master
git merge --no-ff feature-branch[0]: https://confluence.atlassian.com/bitbucketserver/bitbucket-s...
A feature branch needs cleaning up before publishing. These bug and typo fixes needs to be squashed in the branch, but the history should preserve a sequence of commits corresponding to atomic changes because it tells a story.
A commit should be small enough to make it easy to check if the change is correct. It should be self contained so that it can be moved arround, cherry picked, etc.
This is why the local commit history should be considered as a draft of the story, and the one published should be the official one aimed to be easily readable, verifiable and manipulable.
Why not make it easier to see what I want to see, but then let me drill down and see more, instead of removing the details all together?
Squashing commits is a useless thing that has absolutely no benefit. It's dumb. It really makes no sense. It has very clear negatives.
I mean, if you want your history simplified into a linear squashed series of commits, why the hell do you use VCS which models a history as acyclic directional graph in the first place?
To be charitable: Git seems to be a good tool designed for Problem A, being widely used for Problems B,C,D for which it is a fairly poor choice.
I _think_ we got here through some mix of "but it's really fast!"; "I can do _anything_!!"; "But Linus says its great!"; "I don't need to pay for a beefy server any more!"; "New _must_ be good, right?".
In any event, how do we get out the ditch and back to work making software vs. trying to reason about the unreasonable? I personally, for the kind of projects I'm involved with (small teams, all paid by the same piper, with aligned clear goals and competent coders), had perfectly satisfactory revision control systems since around 2000 (except when required by employer to use Clearcase..). It would be nice to get back to that future.
I contributed a little to JUnit and they ask you to squash your commits before making a pull, it took some time to do it, it's so confusing/wierd using regular GIT.
But then the conclusion doesn't quite add up. if I want to remove all the merge mess from a PR don't I actually want to rebase the PR, not merely squash the history? Or dos I miss the point he was making?
if it were up to me it would be alot harder to get a developers license and you would have to meet regulatory standards and have degrees to uphold your professionalism while acting as your own developer it seems to me that alot of developers have taken things into their own hands and are trying to make a quick buck any way they can get it. don't be surprised if you tell your mother or father or sibling to look at what licenses they have agreed to on their phones and they find alot of outdated unassigned licenses to back up their privacy. it seems to be an epidemic and how are we going to stop it?
This leaves all of the feature's changes in your working copy, which you can then stage by line/hunk and individually commit in clean, atomic pieces with the benefit of already having written the final code.
This removes the in-progress commits like a Squash, but pieces of code can still be brought in as individual easy-to-review commits. And it's potentially easier to understand / perform manually than rewriting history via a git command that would do the same thing.
Additionally, the lack of the commit history for that branch often caused merged conflicts when merging the same branch again (if it had additional new fixes, for example). That's why I switched to using git merge --no-ff.
If it were very easy to do, I would definitely do a git squash, as it keeps history very clean. I just don't want to have the problems listed above.
If you squash properly, each commit will represent a small standalone feature.
It does reduce your commit count though :(
Here's how merge commits, rebasing and isolated small commits work:
You branch your topic off master and make many commits while working on it. This is all local.
Once you're ready to publish for review/integration, you squash fixup and backup commits into a coherent patch set, where each and every revision builds and works. This is where you can rebase and squash for good reason. Now you can push to Github.
If you, like Phrabricator, create a single big commit with all changes lumped together, then it's impossible to bisect and follow the thought process behind changes. Try to git-bisect linux.git vs a repo that's been managed with Phabricator and its mega commits.
With small commits that each make one coherent change, you can easily include relevant explanations in the commit message, which is much harder to achieve with a single mega commit. Further, it's very simple to follow along the development process of changes with separate commits. If you have one big diff, it's hard to understand the changes of a branch, whereas reviewing small commits with an explanation in the message and the overall reduction in diff size makes it much, much easier to understand for reviewers.
With separate small commits you review each step and finally arrive at the complete feature implementation at the end. For someone who has to review code they didn't write this saves a huge amount of valuable reviewer time for actual reviewing than trying to reverse engineer the steps taken in a big diff.
Moreover, with multiple commits, you can easily approve of some of the commits, while requesting improvements for others.
Gerrit implements this well and the process is what linux and git and other projects use when reviewing big patch sets. Set is the important word.
Finally, why do you want merge commits? Unless you always make a single mega commit ala Phabricator or the new Github feature, having merge commits provides a very practical way to see that a set of commits landed via foo-branch-X. If you've ever viewed a git log graph, that's the interesting integration points, which you will lose if you omit merge commits. In a merge commit you can also include extra stuff as part of the merge commit itself, so it's not just useless metadata.
"This is a git extension that merges a pull request or topic branch via rebasing so as to avoid a merge commit."
Regarding non-merge workflows: For me the question is:
should git history reflect a literal record of keystrokes or should it reflect intent?
I strongly believe in the latter.