Squash your commits (opens in new tab)

(github.com)

702 pointsWillAbides10y ago339 comments

339 comments

195 comments · 52 top-level

WorldMaker10y ago· 71 in thread

Sometimes I feel like it's a minority position, but I think it strange all the efforts people go to in order to essentially make the git DAG look like a (lie of a) straight-line CVS or SVN commit list. Seeing how the sausage was actually made (no rebases, no squashes, sometimes not even fast-forwards) isn't pretty, but it is meaningful and will tell you a great deal about a project and its developers... I trust that. It's real and visceral and how software is actually made and you can learn from that or find things to explore in that jungle. Projects with multiple developers that yet have straight line commit histories and super tidy commits are aberrations and full of little lies...

Kudos to GitHub for providing this feature that a lot of people have asked for. I obviously don't plan to use it, but I appreciate that it's an option for those people that like their small, harmless lies. ;)

phasmantistes10y ago

I actually disagree. Large teams that still have linear commit histories doesn't mean it is a lie. It means that the code review process is more important that the code writing process.

For example: I check out a repository, and create a local feature branch. I create a commit containing the tests for the new feature, then one for the first draft of the new feature, then two or three for bugfixes. Each commit is small, and self-contained, but importantly isn't standalone. If someone checked out the repository in the middle of my chain of commits, they wouldn't have a working product. Then I upload my change for code review. There's no point in reviewing each of my ~5 commits individually: they only make sense to the reviewer as a combined unit. And there's no point in landing them individually: they only make sense for the overall project history as a combined unit.

In a project with many developers (e.g. 1,000 like the Chromium project), every developer has different local practices. Some keep their work based on HEAD of master via rebase, others via merges. Some do test-driven development, some don't. Making the code review the atomic unit of work, rather than the messy string of local commits, helps the project enforce common etiquette, commit formatting, and readable history.

sulam10y ago

caveat: I was responsible for code review for 2000+ developers.

We only allowed squash commits on master because of what you're describing. That is the level where history "made sense". However, for code review, we wanted to support both styles, because there is an advantage sometimes to seeing the sausage being made. For instance someone will refactor something -- maybe change a method name. Then they apply that refactoring at all the call sites. Very conscientious developers would break this into two commits. We didn't want the first commit on master, but it made sense to review this way, because it was easier on the reviewers: change, effect of change on everything else.

I call this "telling a story" with your commits. There's a lot of value in that style if you have the time to do it.

The other style of commit-by-commit reviewing, where I see all of the work in progress commits, I don't find valuable at all and I _definitely_ don't want to see on master.

6 more replies

jldugger10y ago

> It means that the code review process is more important that the code writing process.

If that were true, the optimal solution is PRs with individual commits that all pass testing. I find it much easier to review a series of small changes for logical correctness than mashing them together into a single PR. Github recently added this as a feature, so I'm not in a completely invisible minority there.

And then, when the review is over, having discreet commits makes git bisecting down to the commit that broke the system more granular.

3 more replies

WorldMaker10y ago

«Making the code review the atomic unit of work, rather than the messy string of local commits, helps the project enforce common etiquette, commit formatting, and readable history.»

I agree that code reviews should be important, and key to understanding the history of a project at a more useful timescale. I just disagree that they should be "atomic" and that a reviewer (even, or especially, a later code archeologist) may not have reason to inspect or dive into smaller units within code reviews.

Where I think that we may agree is that I feel that even if they shouldn't necessarily be "atomic", I agree that code reviews should probably be first-class objects when talking about and dealing with source control. In git, you can use --no-ff merges today as a useful approximation of code review boundaries (especially with PRs and GitHub's default --no-ff and including linking PR #s). It might be nice to see code reviews or other aggregates of commits/commit graphs be truly first-class citizens of git in some manner.

1 more reply

lotyrin10y ago

Yep, the key to understanding git (why there's mutable history, etc.) is to understand that because of its history (designed by Linus) it's not designed to make day-to-day developers's lives easier, but to serve foss project maintainers and release engineers.

People who review code for inclusion in a project, want to track meta-progress on issues, want to pin versions for release, etc., mutable history means they can squash fixups or fix your whitespace for you, rebase changesets onto other changesets, have history that reflects the project management strategy, etc.

Justsignedup10y ago

Also it is very likely that my feature branch isn't really complete during all the little commits I make. The commits are useful to me, but don't often indicate intent too much. The whole thing put together may be easier to diff when a few years in the future someone tries to figure out what happened there.

itaysk10y ago

Regarding your example - why would you commit any of the 5 little commits, If they didn't complete the task, and also are breaking the product? wouldn't it be better to commit only when you completed the task you were working on? (I mean, if the task is made out of sub-tasks, then I would also commit each subtask or milestone that I feel is important, but this is not the case here)

2 more replies

jasode10y ago

>Seeing how the sausage was actually made, ... it is meaningful and will tell you a great deal about a project and its developers... I trust that.
...tidy commits are aberrations and full of little lies...
...small, harmless lies.

Interesting choice of words.

Here's another way to think about squashing private commits for public consumption: programmers do not install keyloggers and upload their entire keystroke history including every Backspace and Ctrl+Z used in their text editor to the repositories. And, most of us wouldn't care about seeing it.

Whether John typed "x = 218^H^H73" or "x = 273" is a meaningless distinction and irrelevant noise. Those spurious ^H Backspaces are equivalent to the twitchy multiple commits in private branches. We really don't want to see them. Think of private noisy commits as an extended workspace of a text editor. If squashing those commits is a lie, the Backspace key without an audited keystroke log is also a lie.

Side note: The other comment downthread about keeping all private commit history for "git bisect" is a red herring. Sometimes a commit will deliberately have broken syntax -- e.g. make a quick commit before getting up to grab a soda -- it won't be CI test worthy. Besides, an automated CI server's cpu cycles can point to an upstream integration/test/qa branch instead of a programmer's private branch.

WorldMaker10y ago

«If squashing those commits is a lie, the Backspace key without an audited keystroke log is also a lie.»

In a world with infinite storage space and a good UX on top of it, I could absolutely see a case where it might be amazing to have a source control integration with the full undo stack of my editors. VCR roll through someone's efforts Twitch style and grab a box of popcorn as you drinking game your way through their typos...

That said, I definitely will rebase/squash local WIP stuff on local-only branches on my own machine, as I see fit. Yeah, I see those as harmless lies because I really didn't build it that way, but sometimes that's what makes me feel better about publishing that work.

I appreciate you trying to push this conversation towards it's extreme, absurd ends, but I also realize that there are a lot of aesthetic judgments here and I for one lean towards keeping more of the little pieces and the interesting digressions like here's where I totally "brb grabbing a soda" the whole branch and sometimes you taking that break means easier commits for me to review when I'm reviewing your code (whether a code review in a PR immediately, or a research effort down the line) as maybe I need a review break there too. I appreciate not everyone feels the same on this topic.

3 more replies

Stratoscope10y ago

> It's real and visceral and how software is actually made...

It's also extraordinarily cluttered, and it really gets in the way when someone later want to do a 'git bisect' to track down when a bug was introduced.

When I'm working I do frequent little commits just to capture and back up my broken stream-of-thought experiments. None of those are going to be relevant to people who work with this code in the future; how does it benefit them to impose my haphazard process on them?

Of course if there's a way to break up my final commit into more meaningful smaller commits, I do that rather than one monolithic commit.

For example if I clean up some whitespace issues, add some new comments to old code, and implement a new feature, I'll put those in three separate commits even if I originally did the entire change at once. In this case you'll see more commits in the public history than I originally had.

Or if I check in a new version of some external library, add an API call that uses it along with its tests, and add UI code that calls the API, those may be separate commits in that order.

My goal is to make the public history useful to future developers.

marssaxman10y ago

I'm curious about these workflows where history matters so much, because people using them must be doing very different things than I do. I use 'git blame' every now and then, 'git show' more frequently, 'git bisect' practically never. I think I would spend at least two orders of magnitude more time squashing commits than I ever have to spend dealing with the consequences of a non-squashed history.

I did a rebase once because there was a big mess I was trying to clean up in order to make a merge work. I would be surprised if anyone has ever cared about the details.

1 more reply

rtpg10y ago

But squashing makes git bisect even harder, since the chunks are bigger. If you have a very granular history then you can see exactly what 15 line change caused the breakage

(though the counterpoint is that maybe there are parts of the granular history that are just busted for other reasons)

1 more reply

zachrose10y ago

You make a good point about bisecting. (Although I guess if you're too overzealous with rebasing you'll arrive at the commit you're looking for and it will be huge. Balance in everything yadda yadda yadda.)

Pxtl10y ago

To me this still sounds like a display problem more than a data problem. The viewers could implicitly hide non-tagged commits and then you can expand them out to see the details if you should need.

stormbrew10y ago

What frustrates me is that there are mechanisms to deal with this problem that maintain the DAG and history (cleaned or not), but people jump to the solution of destroying the process behind the code rather than use them.

In particular, I would kill for support for basically the `git log --first-parent` option in viewing the history of a branch on tools like github and gitlab. Rather than squashing your branch, you make your merge commit have a meaningful commit message (which you do anyways for a squashed commit) so that you don't always have to be viewing the tangled web underneath.

There should be no practical difference between a squashed commit and a merge commit from the perspective of the branch that commit is on (they represent the exact same change from parent to child), but the tooling insists on giving you the most complicated possible view all the time so there is a tangible difference.

geuis10y ago

Couldn't disagree more. Having worked extensively on teams on both sides of this issue, I can experientially state that a well-done git rebase and commit strategy is much more useful and helpful.

In terms of feature branches:

The individual engineer is free to do individual commits in their branch as they need to in order to keep track of their work. Before they submit a pull request, they should rebase and squash all of their commits into a single one that thoroughly describes everything in the feature that is being committed. When used in conjunction with tools like Phabricator, Arcanist, and commit templates, the workflow is very smooth.

When another team member goes to code review their pull request, rather than having to examine multiple individual commits there is only a single one to examine and comment on.

Master history:

Rather than cluttering up the mainline history with 'Did this', 'Did that', 'Merged: Did this', 'Merged: Did that', 'Reverted: Merged: Did this' etc, you get a series of commits that articulately describe what each commit was for. In the event you need to revert a feature because it breaks something, its much easier to revert that single commit than trying to hunt through all of the individual commits from an engineers feature branch. And in that case, if you revert one of the commits from the feature branch it could break something else.

andersonvom10y ago

I agree that a well-done git rebase and commit strategy is much more useful. However, squashing everything will (most likely) lead to gigantic commits that are hard to reason about.

A better approach would be to create multiple small commits that work and are self contained. It's ok for commit N to depend on the preceding commit, but each N should be able to stand on its own.

If devs rebase everything before pushing and push often (therefore also rebasing often), conflicts will happen a lot less often. Devs can also use their private branches for temporarily saving all WIP, squashing/rewording only what makes before submitting the PR or pushing to master.

WorldMaker10y ago

I'm only arguing that what you see as clutter, I see as potentially interesting history, and a mess of merges and reverts are more interesting than people give them credit. It can tell you quite a lot about a project. (Did you learn from reverting 'Merged: Did This' anything about why it shouldn't have been merged? Did you miss something in 'Merged: Did that' that didn't quite merge easily? That's easier to find/diagnose with smaller more often merges than big evil merges...)

Anyway, to each their own, and I appreciate your preferences differ from mine.

1 more reply

teacup5010y ago

Have you tried making sense of a project 10 years old? 20? 40?

I can "experientially" state that squash throws away very necessary information for anyone trying to make sense of old code.

1 more reply

dsp123410y ago

isn't pretty, but it is meaningful and will tell you a great deal about a project and its developers... I trust that.

  Here is a (made up), but generally realistic git log

  git log | grep -i WIP 
  
  mon 5pm - WIP, going to work on this from home 
  tue 4:45pm - WIP, going to work on this from home
  wed 2:30pm - WIP, meeting
  wed 5pm - WIP
  thu Noon - WIP, working from the cafe on my laptop
  fri 5pm - WIP, working from home
  sat 3pm - WIP, heading home sick for the day

Does it really matter to anyone, and count as anything but noise to know that I committed my work in to the repository just so that I could work on it from a different computer. I can't imagine how low the signal to noise ratio would be if every person on the team did this.

WorldMaker10y ago

It does tell me a great deal about your development habits, yes.

Maybe not information that I care to do much more than skim, of course, unless I'm your manager looking for reasons why you might be working from home too much. :) (That said, there's probably some cool deep learning applications here...)

Some of the information, for instance, is that maybe you are working on pieces too large at a time and should find more ways to break them into smaller units of work that you can more easily commit in logical piece at a time rather than "snapshot dumps" between computers.

Like I said, from a hyberbolic standpoint, how the sausage is made isn't pretty and is full of garbage sometimes, but it is informative.

1 more reply

conductor10y ago

I can relate to that. I don't like those WIP commits on a feature branch (when the program is crippled or even doesn't compile or doesn't pass the tests, when the code is filled with temporary "printf" debug messages, etc.) to be in the history of the master branch.

Ideally, every commit that I'm making should be preferably not big, but logically complete and working. The problem is that sometimes I want to work on another branch and I have to commit in the middle of the work so I can checkout into the other branch in which case (without a squashing merge) the "offending" WIP commit would end up in the master branch's history.

1 more reply

jarfil10y ago

Get yourself a WIP branch, do whatever you want on it, extract clean commits from it, kill it with fire when finished.

ris10y ago

I think this is a strawman - I don't think anyone is suggesting publishing WIP commits.

2 more replies

Pxtl10y ago

But why delete them? Why don't the tools just hide them?

chris_wot10y ago

I think I would go insane with commit messages like that. Sorry, but they have very little value - I'm afraid that it's not important to know that you've gone out for a meeting, a coffee or sick. Commit messages in a mainline tree should be reserved to explain what you've done, and nothing more.

I would hate to have done a bisect to land on your commit "WIP, heading home sick for the day" as the one that caused the bug. By all means, create a WIP branch and if you can then push this WIP branch to the main repo, but please use commit squashing into logical units of change when merging into the feature branch or main trunk!

voltagex_10y ago

Yep, on TFS where branches are a world of pain, commit messages look like this. Sorry.

krisdol10y ago

Tell whomever is naming those commits to stop. We don't squash or do PRs, but that is not at all a realistic history.

1 more reply

joshdick10y ago

Write better commit messages. Garbage in, garbage out.

InclinedPlane10y ago

What annoys me is that so many people take the position that there's no other way to improve the experience except for building an hazardous and error prone system into the core workflow of a tool that should ideally deal primarily with immutable history, rather than building better tooling to manage that complexity and present history in a useful way other than a raw list of commits. I mean, who would build such tools, and isn't managing complexity hard? Oh wait, it turns out we're developers, and managing complexity is the core reason of existence of all software engineering. It's really rather silly.

But it's a lot easier to convince devs that using tools with bad UI elements is hardcore and makes them look smart instead of demanding improvements.

WorldMaker10y ago

Yeah, I definitely agree. Especially coming from a source control system before git where mutability was extremely hard (and likely to cause, deep, terrible problems) so a lot of great work had been put into making it less likely you would even need to mutate things down the road, such as good interactive defaults to help lead you through exactly what was going into a patch....

Certainly there are a lot of people that seem prefer imperative mutability, and more power to them, but maybe we learn from all of this and build better tools too.

chris_wot10y ago

What do you propose? If you want to err on the side of commits that show a clear unit of change, with a clear commit message and you make a syntax error by accident, do you seriously think it's a good idea to force a dodgy "oops, syntax error" commit into the mainline code branch?

I agree that code once committed to master should be immutable, but your own private commits should be maleable. It's not a matter of feeling superior or trying to look smarter, it's just really commonsense that you should try to submit easy to understand code changes and remove as much unnecessary extraneous rubbish as possible.

1 more reply

zenlikethat10y ago

They're not full of little lies. Projects insist on squash-before-merge to help developers keep their sanity.

Imagine working on a fast-changing block of code with 20 or so other people concurrently. If all 20 of those peoples have patches submitted to master for review, and all of those patches have multiple commits each, then every time one gets merged, the others will have to rebase to its changes and fix conflicts for _every single commit_ while the fast forward plays out. It's a horrible experience. It prevents desirable code from getting in as contributors drop out due to the browbeating.

Squashing commits can be the difference between tediously fixing something once vs. tediously fixing it 20 times. No one needs to know that you changed your mind about calling that struct "ConfigOpts" before it was ever introduced upstream.

WorldMaker10y ago

«the others will have to rebase to its changes and fix conflicts for _every single commit_ while the fast forward plays out»

I'm mostly sort of advocating a "rebase none of the things" approach. Fix conflicts only when they happen in a branch (the GitHub PR system very nicely doesn't let you merge branches that conflict with your target branch and with CI information even better it won't let you merge branches that don't build). It's really not a bad experience.

ealexhudson10y ago

Sometimes the scaffolding should be left in place to help show others how to build things; other times, it was put up and taken down so many times that the learning from the first few attempts isn't worth it.

I would probably be much happier not squashing stuff if you were able to bisect cleanly to points between branch merges. I don't think that's even theoretically possible; which means that when you're bisecting it's quite possible to pick up half-baked mid-branch points that you have to recognise for the broken rubbish they are - lots of false positives there occasionally, and makes bisect a lot less useful. On a repo with nice squashed commits, you tend to be able to narrow down to the feature very quickly - of course those commits then tend to be bigger etc., but I find that less of an issue.

WorldMaker10y ago

Maybe bisect needs a flag like --start-with-merges to focus it's efforts on -no-ff merges?

Also, what if there was a tool on top of bisect that could better utilize GitHub PR JSON to target the search pattern? That could even save you some time in the case where you already have CI information attached to your PRs...

1 more reply

kylnew10y ago

Have you ever tried following a change in a repo that came from an unsquashed PR? It's hell.

What's truly meaningful IMO is a git log that reads like a product change log

WorldMaker10y ago

«Have you ever tried following a change in a repo that came from an unsquashed PR?»

I have a self-congratulating black belt in source code archeology. With the right tools, most of which are on GitHub, even, such as good commit range diffing, smart uses of tags and branches, and knowing how to navigate the DAG from merge commits (more reason to -no-ff) you have a lot of power in your hands.

«What's truly meaningful IMO is a git log that reads like a product change log»

I appreciate that point of view, but I don't share it. A product change log, I feel, is a bit of marketing/PR that needs some time, love, and editing; I find a git log is for catching snapshots of raw progress and more often useful in seeing what your co-developers are up to, as they are working.

2 more replies

teacup5010y ago

As someone who has spent a lot of time tracking down the origin of source code changes -- and the reasons for them, and the implications of the change -- by reviewing commit logs, I can think of little worse, short of no commit history at all, than trying to derive anything remotely useful from a commit history that has been condensed down to a product change log.

Even commits that lack good commit messages provide valuable information in the form of insight into the author's cumulative thinking/process.

1 more reply

DougWebb10y ago

Do you see the commit history as a piece of living art, a representation of the community and culture that has produced it, or do you see it as a tool?

I'm an engineer, and I see the commit history as a tool. When I want to know what a block of code is for and why it was written the way it was, the commit history (if it is clean and granular) will tell me a lot about that, and will point me to authors, issues, features, and requirements where I can learn more. I don't care about the process of producing the code, I care about the end result. I get enough exposure to the process when I'm writing my own code.

WorldMaker10y ago

I suppose I see commit history more interesting as works of art and archeology. Code history will be read more often than it will likely ever be interacted with. Like the scuffs in the marble, the fingerprints and brush strokes and little hairs in watercolor and oil painting, there can be beauty in the little flaws.

I've had it put to me that an Architect deals solely with the art of a project and a Scientist deals solely with the science and theory; it's the work of an Engineer to deal in the practical middle where art meets science (meets the real world).

Sometimes it is easy to overlook (or to want to overlook) the little bits of humanity in the machine; the various sorts of creative chaos in the vast ordered systems; the parts of the code that are art.

There's no easy answers to much of this thread, because it is art, it is aesthetics. There's no "right" answer, just "this looks good and pleasing to me and my team" and working to find that practical Engineering border space between the unwavering art of the Architect and the similarly unwavering logic and discipline of the Scientist.

tomphoolery10y ago

There's definitely some merit in keeping the original intent of commit messages even when "committing early and often", but at the end of the day a pull request is supposed to tell some kind of story. When you `git blame` or `git log` a particular file, aren't you more interested in the higher-level changes than a single developer's own miniature storyline of how it got that way?

I'll give you another analogy...If I'm composing a song, and you're not particularly trying to learn how to write songs, what is more relevant to you, the end result or the various drafts and early versions that made it to that end result? For the majority of people who aren't interested in learning the mechanics of songwriting, the "journey" is definitely not as interesting as the destination. For nerds like you and I, that's part of the fun! So I try to keep the original intent of my commits, and preserve their messages in a bullet-pointed list format, to show the individual changes that were made in addition to a higher-level overview of the overall change to the project.

TLDR: Project-level changes are not the same as individual changes, and while both should be represented in commit messages, the project-level changes are overwhelmingly more useful in the future. Git is not about code storage, it's about code communication. It's about developers on the same team communicating with both prose and code in tandem.

kazinator10y ago

If you rebase your unpublished work on top of some recent changes, that is in no way a "lie". Doing it via a merge is completely uninformative, in fact. Oh, George was working, by complete coincidence, at the same time, on something completely unrelated and that had to be merged against my change. Whoa, inform me more softly there, I can't take the raw cognitive overload!

Multi-parent commits complicate the use of git. When a commit has a single parent, we can pretend that it's a delta: a patch. (Like it fscking should be in a decent version control system based on some sort of patch theory!) When we "show -p" that commit, we get a diff, which is against its one and only parent. Multiple parents also complicate certain situations. They rear their ugly heads and create an ambiguity. For instance, consider git cherry-pick. If a given commit has just one parent, we can pretend that it's a delta and "git cherry-pick" it. If it has multiple parents, the ugly truth is revealed: a git version isn't actually a delta. If you want that change, you need to specify the parent!

The parent of a commit should be the thing that the work was actually based on: the work that the developer took and massaged to create the new baseline representing the commit. When you have multiple parents, only one of the parents actually meets this definition. The others are arbitrary nodes in the system which are just installed as the parents.

dahart10y ago

Speaking of trust, I have a really hard time taking seriously the argument that squashing or rebasing is "lying". If you'd stuck to meaningful, I'd consider it seriously. But "lying"? This may be git's biggest flame war, this is an old and tired debate, and calling it "lying" is an over-the-top ham-fisted value judgement of a technical workflow that is a personal choice with legitimate reasons to go either way.

Also, Linus endorses cleaning up your WIP commits before pushing.

wheels10y ago

I don't find "seeing how the sausage was made" simply of interest. It's often very instructive.

Often there will be counter-intuitive bits of code that make sense in "git blame" if you see the original, small, commit that they were created as part of. If they're part of a 1000+ line feature bomb, you lose that important context.

chris_wot10y ago

This was only recently, but for a while I tried keeping my commits to as small as possible in LibreOffice so that I didn't break the build. Unfortunately this caused major issues in backporting fixes, so I changed my practice.

Whilst my small commits were squashed, it shows that even well formed smaller commits that are part of a larger change can often be problematic.

Keeping your commit history clean is important. When I'm bisecting, I don't want to see coding errors like typos and syntax errors, they literally get in the way of the bisect. And when I'm reading through a source file, I'd like to each commit to be significant, or at least entirely relevant to a change. Minor syntax error changes, whilst they can still sneak into the master repository, should be few and far between.

Basically, it also encourages unit testing, rechecking your code, continuous integration, and a raft of other good and best practices around coding. And your colleagues will thank you.

Merge squashing is actually a pretty decent way around this - I'm going to use it as my workflow now. I'll make frequently code commits on a seperate branch, then squash down into another branch, then push this.

forrestthewoods10y ago

I'm with you. At least for most professional environments.

It's weird. Git was made for something very specific, Linux kernel development. It made a lot of decisions to support that environment. However most of us don't work in that type of environment.

If you have a private repo for your job you're in a very different environment. At my job mutating history is the opposite of what I want. I don't want people mutating history. Ever!

Part of the problem, in my opinion, is that Git encourages tiny commits. I might take it a step further and say that Git mandates tiny commits. Too tiny in my opinion. When that's forced upon you a clean mechanism is required. But I'm not sure it isn't sweeping another problem under the rug.

But I'm weird. I'm a game developer. We all use Perforce. It just works. You can't fuck it up. You can't ruin history. You can't get stuck. Artists and designers can be trained to use it from scratch in 5 minutes. It's so easy to use there aren't tens of thousands of blog posts desperately trying to explain how easy it is use.

exDM6910y ago

I'm from a Git background but we use Perforce at work. I wish there were blog posts desperately explaining how simple P4 is because I can't understand how to use it for software development.

To me it seems like a big dump of files like a network mount with locking and some kind of history. But how the hell is one supposed to write software with it?

It has complicated tools for sharing incomplete work. I don't know how you do code reviews but we have a Perl script for that(!).

In other words: if you think p4 is simple and git is not, it's because of your background.

1 more reply

maccard10y ago

> It just works. You can't fuck it up.

Unless someone checks out everything in the depot by accident..

1 more reply

vbit10y ago

I agree with you. In fact I think the reason we have to do squashes and lose fidelity is because the git model is broken.

There is not technical reason a source control tool could not offer multiple 'views' into the commit history. A high level linear view which you can zoom into to see the underlying commits and merges. Why do I have to lose the latter to see the former?

wiml10y ago

Bzr+launchpad does this nicely, I think. The underlying history of the merged-in branch is there, and you can look at it, but by default it looks like a single atomic change.

sethammons10y ago

I'm not a big fan of clean history for the reasons you state. However, there is at least one big benefit of a clean history in the master branch: any commit can be checked out and assumed to have working code. This means you can use git bisect. Git bisect allows you to programmatically search through your commit history to identify when a certain behavioral change happened. If you have commits like "wip, not sure why the app wont start yet" in your master's history, you cannot leverage tooling like bisect. Let the sausage be created at the branch level, and keep master clean </0.02>.

atmosx10y ago

Depends on two things IMHO:

(A) how important the commit is and (B) how many lines of code are changed.

Squash makes sense usually when you have have a branch with multiple small commits that affect the same thing. Doesn't make sense to squash 2 commits affecting large parts of the application just because your commit has to be squashed before merging.

dap10y ago

I understand the point of view that it's useful to see what "actually" happened. But the other way to think about it is like this: at some point, when a developer in the future wants to understand how #master evolved over time, should the burden of linearizing the history rest with that developer (making sense of a complex graph), or with people who make changes to #master when they make them?

WorldMaker10y ago

Obviously, -no-ff merge commits give you big sign boards for groups of changes today.

Other than that, maybe all of this is an indication of a need for a meta-UX over the change graph to annotate and describe subgraphs in new ways.

Daviey10y ago

Projects that use a rich code review tool like Gerrit, such as Android, OpenStack and wikipedia keep the truth in the code review tool.. but the actual git tree is kept minimal and clean.

If you want to dig into the actual commit, and the dirty truth under the surface - use Gerrit.. you can even use git itself to pull down the truth.

I'm expecting this github feature to be the same, the pull request probably keeps the truth somehow.

WorldMaker10y ago

I see this as a tooling problem. Why use two tools when you can use one? Why is "code review" not a first class citizen in your source control world?

I don't have all the answers of what the tooling should be, I just think this is as a good an opportunity to discuss it as any.

GitHub's long-standing --no-ff merges at least are one way of preserving the code review and it's internal changes directly into the git DAG. This mostly works except for tools like git bisect that treat the DAG as if it were a straight line, rather than making use of the fact that the system already supports complicated graphs.

Furthermore, along the questions of why use two tools to navigate the code repository: I sort of wish that things like GitHub PR comments and code annotations made their way somehow into nodes in the git DAG.

quanticle10y ago

To me, the advantage of squashing is that it makes your changes atomic, like database transactions. If you need to back or forward port your functionality to another branch, it's easy. You just cherry-pick a single commit and you're done. If you don't squash, then it's not nearly as easy to automatically move features from one branch to another.

hyperpallium10y ago

Did you hit backspace while writing this? I use explicit ^H because I'm honest.

chx10y ago

This discussion needs to happen because git is fundamentally flawed. There should be no way to change the history but at the same time you should be able to hide the immaterial commits. (Yes, bzr had this. I still mourn over the demise of bzr.)

lucashuang10y ago

+1 for the last paragraph. As long as more people are becoming developers. This convenient feature did save a lot of headaches. At least, it saves time to those who don't want to get deeper into Github.

hashkb10y ago

Squash isn't for linear history. That's the rebase vs merge-master-downstream debate. Squash is for getting your wip commits out of history, that kind of thing.

teacup5010y ago

Many of us believe there's no such thing as a throw-away "WIP" commit, and if there is, the work should be better broken up and managed by the developer so that they're committing fully considered incremental progress.

The easiest time to catch a bug is when it's hiding in a 10 line diff, especially before you commit it.

It gets progressively more difficult as the scope of the changeset grows.

Gmo10y ago

You're not alone man, but I feel that we are indeed in the minority ...

It looks like it comes down to the style of the dev/team, because I NEVER do any kind of WIP commit.

sebastianconcpt10y ago

I'm totally into this style. As a non-linear thinker, I value history a lot. The other style is too much into-the-box thinking.

Bjartr10y ago

Would there be value in being able to both preserve the true history and the retcon'd one? If so, is it feasible?

matchagaucho10y ago

I love squash. Go to the branch for the details. Keep master clean and simple.

EdHominem10y ago

> to essentially make the git DAG look like a (lie of a) straight-line CVS or SVN commit list.

Err, it doesn't make it look like the "lie of a" straight line, it makes it into a straight line. Whatever the other developers do, I move the project forward one ball of functionality at a time when their changes are useful to mainline.

When you use software, why do you run a "release" instead of whatever happens to be in the dev's directory when they leave for lunch? Don't you feel dishonest getting the version without the bugs?

> Seeing how the sausage was actually made (no rebases, no squashes, sometimes not even fast-forwards) isn't pretty, but it is meaningful

Perhaps if I was going to hire you it'd be interesting to glance at how you work with nobody looking. Do you keep your desk tidy or not?

But it's absolutely irrelevant to the final project and as such, it shouldn't be stored.

> I trust that. It's real and visceral and how software is actually made

You should read Tracy Kidder's _The Soul of a New Machine_, it's a good read about sausage.

But it's not how you should work because you have choice now.

rimantas10y ago

Yes it tells the story. Does it help to understand the history of the code? Not necessary. I care most about what, when and where came from, not some fiction around that.

WorldMaker10y ago

I appreciate you using the word "fiction", but I think you have it backwards. The cleaned up post-facto linearization and "cleanup" is writing a fiction, telling a story about the changes rather than actually being the changes. This is great and I realize that that has its uses to build good stories about what our code is.

I'd like to think, however, and I think that this is my larger point in this thread, that maybe we could build better storytelling tools that don't delete/mangle/mutate the actual history so that we can see in the same repository both the story and the raw facts.

draw_down10y ago

I disagree, it's mostly not very useful information.

cballard10y ago· 36 in thread

This is a bad idea masquerading as a good idea. Before making a pull request (or doing any sort of merge), you should rebase against upstream master (or whatever you're going to push to). However, keeping distinct atomic commits that change one and only one small thing, when possible, is much preferable if bisect or blame is used. If you have broken or poorly written commits, use fixup, reword, squash, etc. in rebase -i.

Using fast-forward (and possibly only allowing fast-forward) is a good idea. Squashing entire pull requests that may change multiple things into a single commit is a very bad idea.

JoshTriplett10y ago

If someone prepares a pull request with a well-structured series of commits, making a logical series of changes, where the project builds and passes tests after each commit, then those commits shouldn't get squashed.

However, I frequently see people adding more commits on top of a pull request to fix typos, or do incremental development, where only the final result builds and passes, but not the intermediate stages, and where the changes are scattered among the commits with no logical grouping. In that case, I'd rather see them squashed and merged than merged in their existing form, and having a button to do that makes it more likely to happen.

nhaehnle10y ago

The trick is to not squash everything into one giant commit, but to use rebase -i liberally to squash/fixup those typo fix commits where they belong.

1 more reply

andersonvom10y ago

Plain squashing commits, while still a valid option in very few cases, will likely lead to gigantic commits that are hard to reason about.

I've seen projects where maintainers clean up poor commits before merging them: rebase/squash/reword only what's appropriate.

Tyr4210y ago

It's also the case that you lose the code review if you force push to a PR's branch after adding in a typo fix and squashing locally, right?

That's a pretty good reason not to squash till the review is done.

1 more reply

michaelmior10y ago

I used to feel the same way (and still do to some degree). However I think the issue is more nuanced. I agree that rebasing beforehand is a good idea. But I can see the value in keeping commits on the master branch corresponding to specific features or bug fixes (which presumably map to PRs).

I think the argument can be made that if you don't feel comfortable performing a squashed merge of a PR, then that PR contains too much work and should be split up. However, I don't think there's an easy rule to decide in either case.

cballard10y ago

Small PRs are an issue because PRs are dependent on other users and can't be dependent on a prior PR.

Let's say we're adding an interface/typeclass/protocol and a concrete implementation. I'd say these should be two separate commits, as they're adding two different things. An interface doesn't require a provided implementation to work. But, if we were to create those as two separate pull requests, it would be more work for the project maintainers, and the initiator wouldn't be able to create the PR for the concrete implementation until the interface PR was merged - the concrete PR can't be added as a dependent PR of the interface one, or something to that effect.

Since you can "compare" almost anything on Github, small commits aren't really an issue, just view a larger-scope comparison to get an idea of the whole PR.

Another way to put this might be that commits are for individual code changes that build up to a pull request, which is a conceptual change?

3 more replies

BinaryIdiot10y ago

> Before making a pull request (or doing any sort of merge), you should rebase against upstream master (or whatever you're going to push to)

See, and maybe this is because I'm just dumb or something, but I have never gotten rebasing to work for me. Ever. Every single time I do it I read at east 3 articles about it so I don't screw something up, I attempt to do it and ultimately I lose a bunch of work.

I just don't get it. I can write web, mobile and desktop apps and I like to think I'm pretty good at it. But I'm one of those people who constantly have commits of merges in their code because for whatever reason I just can't get my head around making rebasing work correctly.

Am I the only one? Sorry for the derail but it's bothering me that I've never gotten this to work correctly and I feel otherwise normally smart. ¯\_(ツ)_/¯

azernik10y ago

A few tips!

1. Always use the "upstream" branch as your rebase target - "git rebase -i master", or " git rebase -i origin/master". This is almost always what you want, and picking the wrong base is the most common error I've seen when teaching people rebase -i

2. Use autosquash! https://robots.thoughtbot.com/autosquashing-git-commits. If you have trouble with the text-editor interface you get when you run rebase -i, this will both handle its usage, and in the long run give you some visual examples of how the interface is supposed to be used. If you're really into this, set the config option "rebase.autoSquash true" to avoid the extra command-line flag.

3. If you mess up and realize in the middle, git rebase --abort.

4. Use the reflog after the fact for both finding and undoing mistakes: git diff branchname branchname@{1} to check for unintended code differences, and git reset --hard branchname@{1} to undo the rebase.

1 more reply

joshuahutt10y ago

  $ git checkout master
  $ git pull
  $ git checkout branch-name
  $ git rebase master

If there are merge conflicts, open the affected file(s) and resolve them. Then:

  $ git add filename.ext
  $ git rebase --continue

Finally:

  $ git push origin branch-name

If you've already pushed the branch, use -f. Make sure to always specify the branch name when using that flag!

1 more reply

tunesmith10y ago

I think the advice to rebase runs up against the business pattern of pushing your branch as soon as you create it (git-flow and a lot of jira/stash integrations work like this). Also some teams want to see evidence of your commits as you make them, which means pushing as you commit.

If you have a branch and it's already pushed, rebasing just feels kind of funny and can sometimes cause a lot of problems if anyone else has checked it out.

If you have a branch and it's local only, then merging from mainline into your branch and selecting rebase instead of merge is relatively painless.

yuubi10y ago

> ultimately I lose a bunch of work.

One trick that's worked ok for me in a private repo is, before starting to edit the fix-spline-reticulation branch (which has a handful of separate logical changes, fixes discovered midway through a later change that really belong in an earlier change, and temporary debug code that was never meant to go into the product) for publication, to do

    git branch fix-spline-reticulation.0

(or .the-next-sequential-number). Then no matter how badly the "rebase -i master" goes, there's a branch tag pointing at the original state, and

    git branch -D fix-spline-reticulation
    git checkout fix-spline-reticulation.0
    git branch fix-spline-reticulation

will destroy the failed attempt and restore the branch to its earlier state. (Note that if you decide in the middle of the rebase that you're losing, "git rebase --abort" will undo anything you've done so far; you need the backup only if you regret the rebase after you're finished). It also makes it easy to "git diff my-feature.0..my-feature" and confirm that all the changes in the edited history add up to the same as the real history.

Sometimes I do this in the middle of development to move all the changes intended for the product ahead of the temp debug stuff in case I suspect the debug code is causing problems. Keeping the debug code in the dev branch even after the cleanup rebase makes the diff to check the rebase easier (then, of course, the merge should take the commit just before the debug).

Best never to do let anything but the cleaned-up branch hit a shared repo.

lomnakkus10y ago

> See, and maybe this is because I'm just dumb or something, but I have never gotten rebasing to work for me. Ever. Every single time I do it I read at east 3 articles about it so I don't screw something up, I attempt to do it and ultimately I lose a bunch of work.

Rebase takes a little bit of practice, but everyone who's using git owes it to themselves to learn it by heart. It's almost like having superpowers compared to any VCS which doesn't have rebase.

My advice[1] would be to simply create some dummy repository (perhaps just copy an existing repository with some real code) and going through various scenarios described in the git-rebase man page (using some trivial changes). If something blows up, don't worry, you can always just start from scratch.

The key to making rebase work for you is: 1) understanding the underlying model of git[2], and 2) practice, practice, practice. With enough practice you'll get a good feeling for which "type" of rebase works best in a given situation.

[1] In addition to the excellent advice given by others in this thread.

[2] It may look like it's really all based on snapshots of files, but the workflows are definitely mostly centered around patch-based thinking.

nugator10y ago

I know that I probably swear in church since git is the current de facto standard for version control but this shows that gits usability is way too low. Why do I have to invest so much time to understand the inner workings of a tool that should just help me collaborate with my coworkers? I've given up on understanding git and use gitflow and the built in tools in Intellij Idea for all my branching/merging/committing needs.

3 more replies

Estragon10y ago

  ultimately I lose a bunch of work.

Take a copy of the entire repository before attempting anything potentially destructive.

2 more replies

chopin10y ago

In tricky situations, I always commit work done. Then I attempt to do potentially harmful work. Note that afaik you can't lose commits in your history (they may be hidden, but reflog to the rescue). If I am very unsure whether something will work as intended, I place a dummy branch (a tag will do as well) onto that safety commit which will make it easier to find it back (in that case you don't need to resort to reflog). I never lost work once committed even when I painted myself into a corner. Note as well that rebase -i will always create a new commit rebased onto the entry commit. Going back to where you started is always possible.

brown9-210y ago

Use `rebase --interactive` so you can have a better idea of what is going on.

blakeyrat10y ago

I don't blame you. Git has terrible usability.

zb10y ago

I think that GitHub's pull-request-based model is fundamentally broken. Gerrit's model, where every commit is quasi-independent (and hence must pass tests) and you can easily edit without force-pushing anything or losing review history, is superior (though not perfect) in almost all cases. (Exception: merging a long-running feature branch where all the commits in the branch have already been reviewed.)

This is GitHub's attempt to solve the problem without really changing anything. It won't really change anything. Since pull requests routinely contain a mixture of both changes that should be squashed (fixups) and changes that should not be squashed (independent changes), this just means that you get to pick your poison.

natrius10y ago

I used to strongly believe what you do until my company started using Phabricator, which forces the squash workflow on you. It makes your history more useful, not less. The pull request is the appropriate unit of change for software. Make small commits as you develop, then squash them down into a single meaningful change to the behavior of your software.

dsmithatx10y ago

As a git novice I wonder, doesn't a proper workflow do the same thing? When I submit a feature branch it might have a lot of ugly commits. However, once I merge it to an integration branch there is one nice commit explaining what I did.

When coworkers create Pull requests I don't go through all of their commits and changes along the way. I just look at the diff so, I don't see the need for them to squash it first.

1 more reply

scrollaway10y ago

Sometimes yes, sometimes no. Merge commits are nasty imo and I'm glad we can now forbid them outright, but a full squash isn't always the solution either.

Take a look at this PR for example:

https://github.com/HearthSim/python-unitypack/pull/4

Lots of back & forth. All the commits are related, and the PR is there to land all of those commits at once. I could land some of them right now (as they're safe to land), but keeping them in the PR keeps everything related in the same place (and none of them are required until that last commit lands).

A PR mirrors a "patchset" on mailing lists. You don't always want to squash all of it.

What you do want to avoid is a situation like this:

https://github.com/jleclanche/django-push-notifications/pull...

Where the original author creates their original commit and doesn't know how to --amend + push --force to the PR, and you end up with a ton of commits which you don't want to land all at once.

1 more reply

glhaynes10y ago

It seems like it'd be nice to have two levels of granularity exposed in views of a source control system's history, basically corresponding to pull requests and commits. So you could drill-down to individual commits as needed, but would normally be able to work at the PR level.

2 more replies

SEJeff10y ago

Nope, it does this by default, but doesn't force it.

$ cat ~/git/ATLAS/.arcconfig { "project_id" : "ATLAS", "repository.callsign" : "ATLAS", "conduit_uri" : "https://phabricator.$MYCOMPANY/", "arc.land.onto.default" : "develop", "immutable_history" : true }

3JPLW10y ago

Sure, it's a terrible feature to always use. And it's likely to be of little use to contributors who know how to use Git well. But in large open-source projects, often new contributors make a small change that needs a few minor corrections. Eliminating that final back-and-forth ("squash please") is a huge win for maintainers.

Bahamut10y ago

Not only maintainers - anyone tracing back through the history to find out what broke their use case.

haberman10y ago

There's no guarantee that every individual commit of a feature branch is meaningful, or even builds. It also makes the history of the master branch a lot harder to read when it has tons of commits representing the minutiae of the feature's development.

jakub_g10y ago

It really depends on each individual's workflow. I tend to use lots of "in progress" commits (each time things are "green"), and as I go, I regularly squash the commits, so in the final pull request I typically have several commits (and if I wasn't squashing it would have been a dozen). If I do feature and a refactor, they are always separate commits, it's easier to review these and bisect if something turns out to go wrong.

Some people might do similar things but they might not assure each commit is green, and they never squash anything (so you end up with non-meaningful commits).

As @3JPLW said, I see when it can be useful for opensource maintainers to have the option to squash someone's commits, when the change is small, but there are many commits (due to a review ping-pong etc)

echion10y ago

There's no guarantee, but there are many benefits to striving for this ("git bisect run", CI test results).

yxhuvud10y ago

If it isn't meaningful, then that is something the review stage should catch.

falsedan10y ago

I use this workflow:

  1. branch off master
  2. work, commit, push, test (on CI server)
  3. decide it's time to ship
  4. rebase -i, push, test (again)
  5. git checkout master && git merge --no-ff feature_branch

(make the merge commit message a summary of the feature)

master ends up being a list of feature branch commits, bookended by the merge commit which introduced the feature. Getting the squash commit diff is as easy as 'git diff feature_branch_merge^..feature_branch_merge'.

phasmantistes10y ago

Squashing entire pull requests that change multiple things into a single commit is a bad idea, yes. But uploading and asking for review on such wide-reaching pull requests is a bad idea in the first place.

Using fast-forward without squash is also a bad idea in many cases: the string of commits may contain multiple points that don't actually build or pass tests, even if the final commit in the chain fixes all that. There's no point in landing those broken commits, and doing so will confuse bisection tools.

Fast-forward with squash, and enforcing reasonably sized code reviews as a matter of culture, is the best of all worlds in my opinion.

golergka10y ago

Why would you want to rewrite your whole work history and change the actual state of the repository at each of your commits? Why don't just merge?

draw_down10y ago

Rebasing seems to clutter the Github PR's commit history and diff with all the commits to master that were made between the time the branch was cut and the time the rebase happens. But it doesn't do that if you merge in master. I never understood this.

jgraham10y ago

It's a bad idea because it's a bad implementation. If it allowed you to select what to squash, defaulting to the behaviour of git rebase -i --autosquash master then it would be a clearly good feature.

serge2k10y ago

> Squashing entire pull requests that may change multiple things into a single commit is a very bad idea.

If changes are too large/complex/disjoint to fix in a single commit then why have them in one PR?

jibsen10y ago

I wonder why they did not add `--ff-only` as an option, like GitLab has.

MBlume10y ago· 5 in thread

If everyone on your team actually knows how to use git, much better to let them rebase their commits and mark out a series of clean, atomic commits which introduce the feature you're reviewing. If you have people who are incompetent at using git on your team, this feature will help protect your history from them.

munificent10y ago

> incompetent at using git

This unfairly places the blame for Git's utterly shitty UX on the part of the users. When you have thousands of users who struggle to use a tool correctly, it's the tool's fault, not theirs.

I've been using Git for years, work professionally full time on an open source project that lives on GitHub, maintain several open source projects with a number of committers and generally live and breath Git and GitHub all day.

I still fucking hate rebasing and get tripped up by it on the few times I end up having to deal with it.

jh310y ago

> I still fucking hate rebasing and get tripped up by it on the few times I end up having to deal with it.

When I had no understanding of what was going on, I didn't like it either. Now that I use it frequently, I understand it better, so I don't hate it anymore.

I like running `git rebase <main-branch>`, where <main-branch> is typically master, in my-new-thing branch because it lets me deal with any conflicts from upstream one by one.

I also like running `git rebase -i` in my-new-thing when I have a bunch of commits with redundant messages that I want squashed into a single commit before I push the changes. Basically anything that requires messing around with a range of commits is a good use case for `git rebase -i`.

Why do you hate it so much? There's really not much going on that you should have to hate. To me it's like a bunch of small, compartmentalized merges.

2 more replies

jecjec10y ago

It is incredible to me to see how many people in this industry will sit here and defend an agonizingly terrible tool. Total Stockholm Syndrome.

2 more replies

blakeyrat10y ago

Git's usability is a mess. I wouldn't judge anybody on their ability to use Git.

Being subjected to Git at work makes me long for the days when I used Lotus Notes for email. Sure Lotus Notes is a usability disaster, but at least you get the sense they were trying to make things work. And, to be fair, it got slightly better each version. Using Git makes you feel like it was developed by people who simply hate you and want to see you fail.

blakeyrat10y ago

And once again, giving my opinion on Git apparently is worthy of downvotes on this site. Make sure you don't go against the groupthink, folks.

spleeyah10y ago· 5 in thread

Unsure if April Fool's joke...

mchahn10y ago

It would be the most boring April Fools joke of the day. As a real thing I find it quite useful.

joshmanders10y ago

Would be the worst April Fools joke ever. Even above "drop mic" by Google.

1 more reply

alanh10y ago

Confirmed as not a joke. (The change is subtle as you have to click the green Merge button before being presented with the ability to choose a merge type via a combo button/dropdown component.)

ianleeclark10y ago

I'd cry.

moby10y ago

Not a joke. Squash away. :)

1 more reply

Pxtl10y ago· 4 in thread

This is a presentation issue masquerading as a data issue. If somebody suggested deleting data because a report was ugly, they'd be laughed out of the room.

Give us tools to mark commits as unimportant or group them together as a meta-commit object for history purposes.

codingWithGit10y ago

This is good insight. makes a lot more sense. I think this will take care of 99% of use cases for sqqshing/rebasing.

However, I think this is something that will have to be built into git itself, not by Github.

tedmiston10y ago

> If somebody suggested deleting data because a report was ugly, they'd be laughed out of the room.

If somebody suggested keeping data that showed every step of a person writing a report, they'd be laughed out of the room.

It's gray area that varies significantly by feature complexity and team dynamic. I find it frustrating when someone merges a branch with 1 real commit and 3 more one-line commits or trivial fixes that should have been in the first.

greg0ire10y ago

It's equally frustrating to have cs commits fixed up with a meaningful commit, making the diff of the meaningful commit ugly.

jgraham10y ago

So that's essentially what mercurial's evolve extension does. Which is pretty useful in some circumstances, e.g. if you have a review tool where you want to support rewriting history in a nice way (the typical approach to this in git tools is either a) don't meaningful support history rewriting, b) require manual ceremony for each history rewrite, or c) add unique ids to each immutable-across-rewrites commit).

Whilst it's clear that retaining the pushed history can be useful in some cases, I don't understand the notion that disallowing history rewrites helps retain useful data in general. Developers can make commits for all kinds of arbitary reasons e.g. they reached the end of the day, or had to switch branches to work on a different patch. That doesn't seem like a particularly useful thing to record. To take it to an extreme, I wonder if any of the people who think that the precise commit history gives useful information about how a feature was developed have configured their editor to commit on keystroke, since that seems like the logical conclusion of that position (and indeed is effectively what tools like etherpad do). I suspect not because, actually, being selective about the information that you keep is rather helpful. During rebase we see the developer as curator, selecting the most useful representation of a set of changes for the benefit of future readers. or, to use your analogy, if someone suggested removing meaningless data from a report to focus attention on the most important points, then they would certainly not be laughed at, but praised.

Of course GitHub's implementation is too blunt a tool to be really useful, but hopefully we will eventually get something better.

liquidise10y ago· 3 in thread

Oh god no. I made the mistake of moving a team to squashed commits once. The lack of individual commits poses large problems down the line. 2 nonstarters come to mind:

1: Completely ruin your ability to git bisect any bug injected in your branch. Instead of getting a 10 line commit, bisect will point you to hundreds or thousands of lines instead.

2: All code will blame to a single person. Code with 6 people on a large branch? Want to git blame the code to see who wrote the function that is weird looking so you can ask questions? Too bad.

Do not squash branches on teams. One of the biggest mistakes of my professional career.

dsymonds10y ago

This is for a "pull request", namely an individual logical change. Your two points don't apply. If you're dealing with whole branches with multiple developers, stick with merging, but this isn't that.

shados10y ago

point 1: merge smaller features more often. Multi-thousand line PRs suck no matter if they're squashed or not.

point 2: As people mentioned, this is for PRs done by individual people, usually squashing their local history. If multiple people were working on the branch, they should have been PR-ing against that branch (squashing the commits), then you merge that branch to master (you may rebase it, but not do significantly destructive stuff).

Even then, these are features to use with critical thinking. If your PR is really large, massage the history to be relevant and meaningful for git bisect purpose. Use full squashing when it was a local work full of "My hands are typing words" commits of no relevance.

I swear, we're in an industry full of people making 6 figure salaries, who treat their job as if it was a call center script to follow. If you're an engineer, use your head and engineer solutions around the tools you have.

exDM6910y ago

Your experience sounds horrible, but you were clearly doing something very stupid.

Squashing is a tool to use with discretion. You squash two or three 5 line wip commits together. And I'd never squash changes from two authors (except perhaps whitespace or comment fixes).

gjtorikian10y ago· 2 in thread

I mentioned this in a tweet[0] but we have a quasi-tradition of shipping on April 1st:

* https://github.com/blog/1815-l-is-for-labels

* https://github.com/blog/1451-branch-and-tag-labels-for-commi...

* https://github.com/blog/626-announcing-svn-support

[0]: https://twitter.com/gjtorikian/status/715972348860633088

technoweenie10y ago

The first time we announced svn support, it WAS a joke: https://github.com/blog/31-back-to-subversion

kylehotchkiss10y ago

I loved this feature too, but thought it was a joke at first!

gonyea10y ago· 2 in thread

I hate squashing branches. There's a lot of value in commit messages; they're educational and are a form of documentation. It's on the developer to squash the "oops" commits, during a rebase, rewriting the commit message so it has some value when going back in time to look at changes.

I'd love to see a commit linter, that points at commits with text like "oops" and "fix my derp" to suggest possible commits to squash.

Git history shouldn't resemble a hot mess, but the evolution of code should be pretty granular. I'd take the hot mess over full squashing, though.

reledi10y ago

Commits to squash can start with [squash] and when you rebase they are automatically squashed.

codingWithGit10y ago

If I feel like the history of a branch is important, I add a git tag to the branch head before rebasing. I have a naming convention for these tags. Then I usually do a rebase -i on to the main branch.

gus_massa10y ago· 2 in thread

Perhaps I'm greedy, but I'd like also an additional option to rebase without squashing ...

Also, it's not clear if it is possible to disable the merge button completely. I prefer to use the command line to rebase and fix the details in the commits, but the big green "merge" button is always too tempting and it's easy to press it by mistake.

jakub_g10y ago

This is a huge issue for me both with GitHub and Atlassian Stash.

On both you even cannot see when the branch was started, and you can merge fine as long as there were no merge issues. Then you run gitk and it looks like a spaghetti-horror, with trivial branches being started 200 commits ago.

I don't necessarily need a "rebase&merge" button but at least info about the shared ancestor with master.

codingWithGit10y ago

you can always do this in the command line locally.

ianleeclark10y ago· 2 in thread

This is a feature I've wanted for such a long time. While it's perfectly feasible to do through the command line, I've always found myself having to force push to update the history on github's side of things.

I typically did it through git rebase -i HEAD~N, so maybe someone here on HN knows of a better way to squash a commit whenever you're updating remote history. Albeit, it seems that updating remote history with a squashed commit isn't entirely attractive behavior and that's why I was forced to force push.

klodolph10y ago

Two things you can do differently, if you feel this would be appropriate.

Instead of git rebase -i HEAD~N, I typically use git reset --soft HEAD~N followed by git commit.

Instead of git push --force, use git push --force-with-lease. This only updates the remote if the remote's current state is what you expect. For example, it will fail if someone else has pushed to the same branch in the meantime.

azernik10y ago

Usually I'll do git rebase -i origin/master (or upstream/master, or wherever in branched off of on the first place). Doesn't require me to count up to N commits, and also does an actual rebase.

But yeah, force push is kind of inherent to the process - you're rewriting history, no two ways about it. Usually for my own forks of projects, though, I'll go into .git/config and add a 'push' option with a + in front of the refspec - this enables force pushing always. This really only works and is safe for workflows where there's a distinction between your own personal GitHub account and the upstream that has the authoritative copy - you really don't want to rewrite history by accident on the latter.

lotyrin10y ago· 2 in thread

Any plans to allow for squashing but with a merge instead of a fast-forward or fast-forwards without squashing?

CI will still run against the hypothetical merge commit, no? I wonder if there are edge cases where merge vs squash+fast-forward would result in different conflict resolutions and different trees, so master could end up with a tree that didn't have tests run against it.

dlubarov10y ago

> master could end up with a tree that didn't have tests run against it

Isn't that already the norm? I.e., most projects run CI on an unmerged PR, then merge to master, then run CI on master to see if the merged tree is actually good. Are there projects which test a PR's post-merge tree before updating master? That would be nice, but I don't know of any tools which support that workflow, and it wouldn't scale forever since it would involve a serial merge queue.

hkdobrev10y ago

Travis CI for example and I'd imagine most CI servers run tests on the `MERGE_HEAD` of the PR which like lotyrin said is the hypothetical merge commit. If a PR is opened from latest master, CI runs on `MERGE_HEAD` which is the same with the merge commit after the PR is merged. But if something was pushed to master after the feature branch was branched off then it's not the same commit. That's why it's recommended to run CI after merging again. But by default tests are not run on the PR HEAD which is the tip of the feature branch.

abalashov10y ago· 2 in thread

Call me naive, but wouldn't this problem be best solved by requiring that commits represent meaningful and working increments of work?

I use 'git add -p' judiciously and only commit when having reached a point where something can be usefully said to be in some way "done". Sure, it's not perfect, and occasionally I end up having to do some cleanup of miscellaneous printf statements, debug values or typoes in subsequent commits, but this is something that should really be avoided if possible.

cortesoft10y ago

Waiting until you have something useful has other drawbacks. For example, it might take many hours or days to get to a 'useful' and 'done' state - I don't want to go hours or days without saving my work in a manner that is easy to retrieve if something goes wrong.

Besides the obvious 'hard drive failure' or 'laptop stolen' situations, there are also more frequent situations where 'oh shit, I went down a totally wrong path there - let me back up a bit and try that again.' Git commits are little save points that let me do small experimentation and go back if I need to.

abalashov10y ago

Very good points. I guess I've just been too lazy to learn advanced rebasing / commit aggregation. Maybe I should, so I can more effectively take advantage of the upside while not polluting my commit history with intermediate crap.

Afforess10y ago· 1 in thread

Nice to see Github catching up to Gitlab[1]

[1]: http://feedback.gitlab.com/forums/176466-deprecated-feedback...

sytse10y ago

For sure! Good thing we have some great stuff in the pipeline https://about.gitlab.com/direction/ :)

pkamb10y ago· 1 in thread

I'm waiting for the code review tool that takes a massive squashed PR and cuts it up into a series of clean, atomic commits that can be reviewed individually.

neandrake10y ago

I believe Phabricator supports this workflow. Create a revision for a code change - as more commits are made, update the revision with those changes. Users can see what changes were made between commits and hold discussions on them. When it's ready, landing can (optionally) squash the diffs into a single commit upstream. The main history will be linear and more-complete changes, while detail about the development of the change is retained in Phabricator revision.

https://phacility.com/phabricator/differential/

wandernotlost10y ago· 1 in thread

This obsession with "clean" history seems to me nothing short of insanity. Source control has one job: to record the history of code in order to be aid in figuring out what happened when things go wrong. If everything always went well, we'd just have great merge tools and throw away the history.

I can understand a desire to filter out stages of a project by different levels of review (I just ran the build and tests passed so I committed vs a bunch of people reviewed it so I merged), but that people solve that by deleting or rewriting history to be something different from what actually happened is just nuts.

Is it just that git makes it easier to change history than to add metadata for filtering? Why have we not seen presentation tools to solve this problem rather than what seems like a ubiquitous readiness to alter and throw away the messy but accurate facts about what happened?

jedberg10y ago

Because the lack of squashing actually makes finding bugs harder. Which checkin with the status "change i to l" or "whoops, typo" was the bug in?

It's a lot easier when an entire changeset has a single checkin into master, because then when you're doing your bisection there are a far few changes to bisect.

devit10y ago· 1 in thread

The key principle is that the software should always correctly work at any point in the chain of commits, so you must squash commits that are "oops, fix X in previous commit".

Once that is satisfied, commits should be as small as possible, so that information about the grouping of changes is preserved.

chmaynard10y ago

This advice seems very sensible. Thanks for adding some valuable ideas to the discussion. I'm now reading the git documentation on re-writing history:

https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History

macrael10y ago· 1 in thread

Do any VCSs have a notion of a commit of commits? If you could group a series of sequential commits into one commit on trunk it seems like you could have the best of both worlds: an overarching commit for your change and a series of how the sausage got made.

tedmiston10y ago

Sure, a merge commit.

    git checkout master
    git merge --no-ff feature-branch

simplify10y ago· 1 in thread

This is great, but I was really hoping for a fast-forward-only merge button...

kevincox10y ago

I think you can fake this by enabling "Require CI" and not selecting any required CI branches.

tomphoolery10y ago· 1 in thread

Can't wait to see this feature replicated poorly in about 6 months in Atlassian Stash (Bitbucket)!

kannonboy10y ago

Actually Bitbucket Server (née Stash) already supports squashing and a variety of other pull request merge strategies. They're just configured[0] per repository, project or globally rather than an explicit option at merge time.

[0]: https://confluence.atlassian.com/bitbucketserver/bitbucket-s...

chmike10y ago

This is awfull because it destroys information.

A feature branch needs cleaning up before publishing. These bug and typo fixes needs to be squashed in the branch, but the history should preserve a sequence of commits corresponding to atomic changes because it tells a story.

A commit should be small enough to make it easy to check if the change is correct. It should be self contained so that it can be moved arround, cherry picked, etc.

This is why the local commit history should be considered as a draft of the story, and the one published should be the official one aimed to be easily readable, verifiable and manipulable.

looneysquash10y ago

I'm confused. When I don't want to see the details of the topic branches, I run 'git log --first-parent'.

Why not make it easier to see what I want to see, but then let me drill down and see more, instead of removing the details all together?

ltbarcly310y ago

No, don't squash your commits. This is stupid advice that comes back over and over.

Squashing commits is a useless thing that has absolutely no benefit. It's dumb. It really makes no sense. It has very clear negatives.

https://news.ycombinator.com/item?id=5631184

1 more reply

cyphar10y ago

Yeah, this is dumb. Being able to effectively bisect large projects depends on having smaller commits. If your commits are all huge, your bisect will hit some 500 line patch that does 5 different things -- it'll easily quadruple the debugging time. Not to mention that having small self-contained commits is a good thing.

golergka10y ago

This, as rebase, is a great idea for people who would rather use SVN instead, but have to use git for some reason or another.

I mean, if you want your history simplified into a linear squashed series of commits, why the hell do you use VCS which models a history as acyclic directional graph in the first place?

overgard10y ago

Maybe I'm missing something, but I've never understood the point of cleaning up commit history. The only time I ever look at commit history past a few days is if I want to know when something broke (in which case I WANT every part of the history, even the messy bits), or if I'm trying to refresh my memory of what I've done for the year (for performance reviews or whatever) in which case I guess it's marginally useful, but it's pretty easy to skim over "fix build" and such.

dboreham10y ago

As this thread approaches 300 posts I'm wondering when we're going to get out of this Git Tarpit we've somehow got ourselves into?

To be charitable: Git seems to be a good tool designed for Problem A, being widely used for Problems B,C,D for which it is a fairly poor choice.

I _think_ we got here through some mix of "but it's really fast!"; "I can do _anything_!!"; "But Linus says its great!"; "I don't need to pay for a beefy server any more!"; "New _must_ be good, right?".

In any event, how do we get out the ditch and back to work making software vs. trying to reason about the unreasonable? I personally, for the kind of projects I'm involved with (small teams, all paid by the same piper, with aligned clear goals and competent coders), had perfectly satisfactory revision control systems since around 2000 (except when required by employer to use Clearcase..). It would be nice to get back to that future.

nullc10y ago

Annoying: you can't turn off both. If your project has a workflow where the webui merge should not be used (e.g. using signed merges) there is still no way to achieve that.

cfontes10y ago

Wow, loved this.

I contributed a little to JUnit and they ask you to squash your commits before making a pull, it took some time to do it, it's so confusing/wierd using regular GIT.

LegNeato10y ago

Awesome! Phabricator got me addicted to the clean history of squashing commits and the logical changes staying together in source control (as opposed to grouping in a PR).

hinkley10y ago

It seems to me that the scope of this article is pretty narrow (in a good let's avoid a flame war way). And it describes one of the few scenarios where I think squashes are beneficial.

But then the conclusion doesn't quite add up. if I want to remove all the merge mess from a PR don't I actually want to rebase the PR, not merely squash the history? Or dos I miss the point he was making?

hvmonk10y ago

This is such a nice feature. Thank you for working on it. Keeping history clean is important, especially in enterprise solution where you have to keep support multiple releases. You want to select certain commits/features in one version. With merge/squash, you would get a cleaner history, and it is easy to pick commits you want.

DMOoO910y ago

I believe there should be more open communication and a record the public can fall back on while these developers make their improvements. Something in their face. Too many times did I track back and see unprofessional comments being made and things being done that seemed downright suspicious.

if it were up to me it would be alot harder to get a developers license and you would have to meet regulatory standards and have degrees to uphold your professionalism while acting as your own developer it seems to me that alot of developers have taken things into their own hands and are trying to make a quick buck any way they can get it. don't be surprised if you tell your mother or father or sibling to look at what licenses they have agreed to on their phones and they find alot of outdated unassigned licenses to back up their privacy. it seems to be an epidemic and how are we going to stop it?

cthulhujr10y ago

I could see this replacing the git workflow of creating a temporary branch that's used just for merging pre-master.

pkamb10y ago

After completing a feature (or part of one) I often "squash" manually by performing a mixed reset from the tip of the in-development branch back to a good parent commit.

This leaves all of the feature's changes in your working copy, which you can then stage by line/hunk and individually commit in clean, atomic pieces with the benefit of already having written the final code.

This removes the in-progress commits like a Squash, but pieces of code can still be brought in as individual easy-to-review commits. And it's potentially easier to understand / perform manually than rewriting history via a git command that would do the same thing.

MaxfordAndSons10y ago

This is great! But what about fixups? I often prefer a fixup to a squash when the commit in question was a WIP or something, and I don't have any desire to preserve the message in the final commit.

greg0ire10y ago

This is wrong at so many levels. Of course we don't care about "progress" or "woops I fucked up" commits, but people also do commits for things like coding style, and these commits should be keep separated from the others, and DO NOT warrant a separate PR. If it is about changing all tabs to space in a file, for instance, different PRs will be hard to work with, because of conflicts. Also, people who write WIP commits should learn how to squash them themselves, and give them a meaningful commit message.

eblanshey10y ago

Usually when squashing branches locally, all the changes are set to myself as the author, effectively losing history of who did what in the feature branch. How does Github handle this?

Additionally, the lack of the commit history for that branch often caused merged conflicts when merging the same branch again (if it had additional new fixes, for example). That's why I switched to using git merge --no-ff.

If it were very easy to do, I would definitely do a git squash, as it keeps history very clean. I just don't want to have the problems listed above.

jondubois10y ago

I think squashing commits makes sense if you're working in a big team and/or on a complex/large project - The main advantage of it is that it speeds up the QA process because it cleans up all the back-and-forth (exploratory) changes that tend to happen during development.

If you squash properly, each commit will represent a small standalone feature.

It does reduce your commit count though :(

kazinator10y ago

Squashing is not an alternative to merge workflow. It's what you do to clean up before you integrate your work (whether by rebase or merge). You've just made seven changes, which should just be three: git rebase -i HEAD~7, squash away. Okay, now you have three. rebase them on top of new work in the upstream branch, or merge? Separate question.

losvedir10y ago

This is great news for us as we prefer this flow. However, we actually mostly merge via our custom tool which uses GitHub's API. I can't find anything in the documentation about whether these options apply to API merges or if there's any query params that can achieve that behavior from the API. Anyone know anything about this?

mikebannister10y ago

I personally like this and it fits well with my team's workflow (though I am a bit concerned it will prevent engineers from learning to do these things with git). There are pros and cons for sure but I think if you are losing a lot of resolution by squashing then the scope of your pull requests might be too big in the first place.

cm310y ago

The option to avoid merge commits when merging pull requests misses the point, but apparently many users demanded it for some reason, so Github implemented it.

Here's how merge commits, rebasing and isolated small commits work:

You branch your topic off master and make many commits while working on it. This is all local.

Once you're ready to publish for review/integration, you squash fixup and backup commits into a coherent patch set, where each and every revision builds and works. This is where you can rebase and squash for good reason. Now you can push to Github.

If you, like Phrabricator, create a single big commit with all changes lumped together, then it's impossible to bisect and follow the thought process behind changes. Try to git-bisect linux.git vs a repo that's been managed with Phabricator and its mega commits.

With small commits that each make one coherent change, you can easily include relevant explanations in the commit message, which is much harder to achieve with a single mega commit. Further, it's very simple to follow along the development process of changes with separate commits. If you have one big diff, it's hard to understand the changes of a branch, whereas reviewing small commits with an explanation in the message and the overall reduction in diff size makes it much, much easier to understand for reviewers.

With separate small commits you review each step and finally arrive at the complete feature implementation at the end. For someone who has to review code they didn't write this saves a huge amount of valuable reviewer time for actual reviewing than trying to reverse engineer the steps taken in a big diff.

Moreover, with multiple commits, you can easily approve of some of the commits, while requesting improvements for others.

Gerrit implements this well and the process is what linux and git and other projects use when reviewing big patch sets. Set is the important word.

Finally, why do you want merge commits? Unless you always make a single mega commit ala Phabricator or the new Github feature, having merge commits provides a very practical way to see that a set of commits landed via foo-branch-X. If you've ever viewed a git log graph, that's the interesting integration points, which you will lose if you omit merge commits. In a merge commit you can also include extra stuff as part of the merge commit itself, so it's not just useless metadata.

exratione10y ago

Equally, https://github.com/git-land/git-land

"This is a git extension that merges a pull request or topic branch via rebasing so as to avoid a merge commit."

dclowd990110y ago

Rarely do conversations around these parts get as heated as they do when git process comes up. As I read through these comments, only one thing surfaces: everybody organizes their shit differently. If you're here trying to sell your process, why?

educar10y ago

A bit OT but a feature I would love to see in git is to be able to see which branch a commit came from. Especially for shortlived branches which we prune periodically on the server (git branch -r --contains won't work as a result)

0XAFFE10y ago

This always reminds me of stackenblochen https://youtu.be/Qo_2ReMNzhU

DMOoO910y ago

I can't believe some of the comments I have seen made back and forth. let's shoot for a professionalism to supercede all others

lerax10y ago

An unique thing: http://i.imgur.com/ztUufvf.jpg

kinofcain10y ago

Fantastic.

Regarding non-merge workflows: For me the question is:

should git history reflect a literal record of keystrokes or should it reflect intent?

I strongly believe in the latter.

DMOoO910y ago

no reason why developers should not be reliable for their work. if you ask me there should be a process to handing out open licenses and from what I have seen apache gives any one the right to do what they want. I would love to see stricter laws when it comes to third parties and open source licensing, including the chatter back and forth.

mck-10y ago

Added benefit of this would be that it is easier to git revert a merge from cli

totally10y ago

Wait, this isn't new.

j / k navigate · click thread line to collapse

339 comments

195 comments · 52 top-level

WorldMaker10y ago· 71 in thread

phasmantistes10y ago

I actually disagree. Large teams that still have linear commit histories doesn't mean it is a lie. It means that the code review process is more important that the code writing process.

sulam10y ago

caveat: I was responsible for code review for 2000+ developers.

I call this "telling a story" with your commits. There's a lot of value in that style if you have the time to do it.

The other style of commit-by-commit reviewing, where I see all of the work in progress commits, I don't find valuable at all and I _definitely_ don't want to see on master.

6 more replies

jldugger10y ago

> It means that the code review process is more important that the code writing process.

And then, when the review is over, having discreet commits makes git bisecting down to the commit that broke the system more granular.

3 more replies

WorldMaker10y ago

«Making the code review the atomic unit of work, rather than the messy string of local commits, helps the project enforce common etiquette, commit formatting, and readable history.»

1 more reply

lotyrin10y ago

Justsignedup10y ago

itaysk10y ago

2 more replies

jasode10y ago

Interesting choice of words.

WorldMaker10y ago

«If squashing those commits is a lie, the Backspace key without an audited keystroke log is also a lie.»

3 more replies

Stratoscope10y ago

> It's real and visceral and how software is actually made...

It's also extraordinarily cluttered, and it really gets in the way when someone later want to do a 'git bisect' to track down when a bug was introduced.

Of course if there's a way to break up my final commit into more meaningful smaller commits, I do that rather than one monolithic commit.

Or if I check in a new version of some external library, add an API call that uses it along with its tests, and add UI code that calls the API, those may be separate commits in that order.

My goal is to make the public history useful to future developers.

marssaxman10y ago

I did a rebase once because there was a big mess I was trying to clean up in order to make a merge work. I would be surprised if anyone has ever cared about the details.

1 more reply

rtpg10y ago

But squashing makes git bisect even harder, since the chunks are bigger. If you have a very granular history then you can see exactly what 15 line change caused the breakage

(though the counterpoint is that maybe there are parts of the granular history that are just busted for other reasons)

1 more reply

zachrose10y ago

Pxtl10y ago

To me this still sounds like a display problem more than a data problem. The viewers could implicitly hide non-tagged commits and then you can expand them out to see the details if you should need.

stormbrew10y ago

geuis10y ago

Couldn't disagree more. Having worked extensively on teams on both sides of this issue, I can experientially state that a well-done git rebase and commit strategy is much more useful and helpful.

In terms of feature branches:

When another team member goes to code review their pull request, rather than having to examine multiple individual commits there is only a single one to examine and comment on.

Master history:

andersonvom10y ago

I agree that a well-done git rebase and commit strategy is much more useful. However, squashing everything will (most likely) lead to gigantic commits that are hard to reason about.

A better approach would be to create multiple small commits that work and are self contained. It's ok for commit N to depend on the preceding commit, but each N should be able to stand on its own.

WorldMaker10y ago

Anyway, to each their own, and I appreciate your preferences differ from mine.

1 more reply

teacup5010y ago

Have you tried making sense of a project 10 years old? 20? 40?

I can "experientially" state that squash throws away very necessary information for anyone trying to make sense of old code.

1 more reply

dsp123410y ago

isn't pretty, but it is meaningful and will tell you a great deal about a project and its developers... I trust that.

  Here is a (made up), but generally realistic git log

  git log | grep -i WIP 
  
  mon 5pm - WIP, going to work on this from home 
  tue 4:45pm - WIP, going to work on this from home
  wed 2:30pm - WIP, meeting
  wed 5pm - WIP
  thu Noon - WIP, working from the cafe on my laptop
  fri 5pm - WIP, working from home
  sat 3pm - WIP, heading home sick for the day