I prefer using a precommit hook to automatically prepend a Jira ticket number to each commit so when you look at the history you'll see multiple commits grouped together with the same ticket prefix, but the commits still retain the intention of the commit. Knowing that commits will not be squashed promotes devs to make meaningful commits. I still advocate for cleaning up and squashing your own commits as you see fit with an interactive rebase before your branch is merged. Having discrete commits can also help when running git bisect to find when a bug was introduced so you identify the specific commit instead of a feature being merged.
Your dev branch is clean because each merge-commit is a single commit per task.
So you can see which tasks were merged and in which order and what file as a whole were changed in each task.
If for debugging / code review or any other reason need to look at specifics, you can look through the feature branch commit by commit to see what was changed and why.
It's best of both worlds.
Similarly merging dev into master/main. You get a release by release view of what files were changed in a single merge commit.
Following the rules described above means that you get a lot of context on why code was changed:
* the invidivual commit which should contain a description of the change if the change is not self explanatory or not 'intuitive', the individual commit should consist of only 1 functional change
* the commits surrounding it, the feature branch (clearly visible because of the merge commits)
* the issue number on the commit itself and possibly in the merge commit
I've found all this context very informative for projects that are in maintenance mode and still need changes from time to time. Obviously, the higher the quality of the commit history, the higher the quality of the information you get out of it.
Meaning: if you put a rename with an impact over the whole code base (because the original name just happened to bother you that day) together with a bugfix in the same commit and the commit message has the very informative text 'fix', and the referenced issue mentions 'add support for blah' (but the commit obviously does not implement anything related to 'blah'), then... well, yeah, then how you organise your commit history does not really matter.
There are at least two types of commit in git: a savepoint and a version.
A savepoint is what happens during development on a branch. Git makes it super easy to make many, many savepoints throughout the day. These help you as a developer because it gives you something to fall back on if you make a mistake. But most of them should never be exposed to anyone not directly working on the branch.
A version is what you share with others. A version is a fully working version of the software that can be reasonably checked out and put through a release process at any time. Usually a version will be unit tested but not subject to the same rigorous tests as a release.
There is a direct analogy here with database transactions. Just replace version with transaction.
Often while working you will find it's possible to write the version commit right away. This is usually for more trivial fixes or in some cases when a commit is required for something like a database migration (when things need to be deployed in stages). Other times you will need to make several savepoints before you get to a new version. This is what rebase is for. Many of those savepoints don't belong on the master branch as they are often fixing stuff you haven't even committed to master yet.
Git has a few tools to help you defer rebasing until later. In particular you can make fixup and squash commits. These will be normal savepoint commits, but they will be labelled in a way that later you can issue an "autosquash" command to automatically rebase these into version commits.
Git provides a DAG and you can use a --no-ff merge to build your "version commit" from the sub history of its "savepoints". You can follow one parent of the merge to the next "version commit" or you can follow the other parent through the intermediate "savepoints" that built it step by step.
You can use --first-parent today for most git operations to get "clean views" no matter how complex the DAG web is beyond it. I think a lot of these debates would "go away" if more people and user interfaces defaulted to --first-parent and "drill down" navigation rather than firehose of the complete graph and confusing (but pretty) "subway diagrams".
In my opinion it is best to squash all commits into one before rebasing it on top of the target branch. During this process any information that is considered important for the history can be preserved by leaving it in the commit body.
There's your problem. Code reviews should not allow such commits to pass through.
No, you didn't read the comment fully, or you only disagree with part of it. Because, you clearly missed this part:
> I still advocate for cleaning up and squashing your own commits as you see fit with an interactive rebase before your branch is merged.
If you do that, you don't end up with 'small, poorly named commits'. Or if you do, you have a lazy programmer / an idiot programmer in the team.
Which certainly happens, but, they ruin everything. You can't start shooting down processes, languages, tools, or anything else in the programmer space __just__ because some moron who abuses it ends up in a bad place. You need to show that a tool / feature / process / hook / etc turns otherwise fine, capable programmers into idiots in order to advocate for its abolishment. Not the other way around, or you end up with a blunt rock and a club and are then debating that they're holding the club at the wrong end.
Anyway some workmate use Squash when accepting pull request on GitLab/GitHub as a general workflow suggested by such tools and in context where trunk based development is not feasible.
Completely agree. And I suspect the increasing frequency of squash-merging is mainly to avoid having to do the work of cleaning up and commenting individual commits in a longer sequence.
I can see this both ways, it really is faster and easier to squash. And you’re right, it really does bury some context and functionally makes large changes harder to read or bisect or revert or modify.
One benefit to squash merging that you might have overlooked is that it can encourage frequent (and messy) committing, knowing that the churn will disappear without having to work hard to clean it up. This does, in a way, make the git workflow more appealing and easier to manage for more people.
I've noticed the opposite. Developers who know all of their work will be smashed into one commit at the end tend to not commit as frequently, and the commits they do make are just checking in all of their work at intervals. It is more of a process of saving state. It doesn't matter how frequently they commit if all the commits will become one.
If people make small PRs, committing to mainline as they go, then squashing each PR fits well.
This is only the case if said squashing just bundles commits without context or consistent logic. If merges to a mainline branch consist of feature branches whose pull request was already approved after a couple of iterations then the end result is a cleaner commit with it's history thoroughly audited. In practice it's equivalent to a fast-forward merge of a single-commit feature branch that just happened to be nearly lined up with mainline.
In other words, a commit to us is sort of like an "atomic" change, something that cannot be split or else more or less bad things happen.
I have trouble conceiving a better way to use Git when you really care about the readability of your history. in some cases I don't care about readability though. On hobby projects I sometimes use Git more like a file transfer and synchronization tool. In this case I don't give a huck about how the history looks like.
Just like with code, the more readable this history is (in terms of what features/fixes are in there at some point in time), the better.
Sounds familiar. Lots of people just learn a few template use-cases, but don’t take the time to learn/study the tool. You might argue you shouldn’t need to but Git isn’t intuitive and does require investment. It comes with its own terminology; there’s no point just guessing what a branch is for e.g..
Like all powerful *nix cli tools, if you don’t you’ll shoot yourself in the foot one day. Even if you want to use a GUI interface, it’s no substitute for learning how git works behind the scenes.
Sometimes when I offer to help a colleague that’s got themselves into knots the sad thing is often they can’t even explain the problem they got into.
"Everybody has a plan until they get punched in the mouth."
-Mike Tyson
Surely we can do better than this? A tool being powerful should be no excuse for it having footguns!
For example, see "Why SQLite Does Not Use Git": https://sqlite.org/whynotgit.html
In my eyes it addresses many of the problematic bits of Git and offers alternatives to them. Admittedly, it was sad to see other VCSes like SVN die out, since there were some things that they did more clearly than Git (e.g. revision numbers), despite their other shortcomings.
Currently, for many Git is essentially forced upon them and they have to learn commands that they don't understand to use it, much like you said. If version control were easier and more intuitive, then more people would adopt it and the ones using it would be more efficient with it, rather than people messing up their repositories and others blaming them for it - even though they'd be right, that sort of elitism also doesn't help much, since clearly some mistakes are easier to make than others.
I think that most of the systems out there will eventually be improved or rewritten until finally they are both powerful and usable, even Git probably isn't immune to this. Whether the incentives to do that are there now (since GitHub, GitLab and others are already de facto within the industry) will only affect whether this is done in the next 50 or 100 years.
Here's an issue that we ran into this week:
- at work, we have a "main" branch and a "development" branch
- a new "feature" must be based off of "main", but eventually moved into "development", since that's what the test environments are configured against
- thus, we have a "feature-development" branch, into which we merge "feature", to not sacrifice our ability to put "feature" back into "main" without the "development" changes, if ever needed
- then, we merge the "development" branch into the "feature-development" branch to solve any conflicts or do any refactoring that we need to ensure that those two branches play together nicely, before merging "feature-development" into "development"
- this week, someone did the opposite, they merged "feature-development" back into "feature", thus moving all of the "development" changes into "feature", which we can't allow
- the solution? who knows, one option would be to force push the earlier position of a branch, thus erasing the merge commit, but force pushes are considered a bad practice
- what we opted for in the end was having another commit that reverts the merge (through the GitLab UI), however now the history still shows that "feature" has 100+ commits in it
- the problem that we might run into down the road now is that when we merge "feature" into "feature-development", we'll probably also carry over the revert commit, which we'll then need to revert or do something so that the "feature-development" branch doesn't have all of the "development" changes essentially removed (which may or may not be true)
All of that pain, essentially caused by one bad merge, whereas other colleagues are now also asking me about rebasing. I don't feel like i know Git well enough to be the "go to guy" and would rather just write the code i need for the features, rather than worry about the branching strategies and what colleagues have done and weird client requirements for what branches must be used as a base since there are not enough test environments. Furthermore, GitLab not having an easy way to say "i want this commit gone" is equally annoying. If they offer you to do reverts, surely they can also ensure the equivalent of a force push?Then again, chances are that it's too early for me to say anything vaguely accurate and even so someone out there probably can solve all of it in a line or two. Nonetheless, it feels to me that an easy to use tool would expose solutions for the most common problems in an approachable way.
Of course we can but they all lost the vcs wars so we’re left with git, a chainsaw you can’t shut down with hooked spikes bolted to the handle and an old “handle with care” sticker long gone unstuck and slipped under one of the shop’s cabinets.
To be fair to GitLab, that would just be another way to say "force push".
Either you want the change gone from history, which makes it necessary to coordinate with everyone who has taken out a branch during that time (which is a force push) or you want the merge-and-backout to be visible in eternity (which is a revert commit). There's no way to have both, by definition.
The way to avoid screwups, in any version control system, is to have everyone actually read what they are about to persist to our shared history.
The by far most popular process to enforce that is code review. Any pull request which pulls in hundreds of unrelated commits hopefully won't get accepted. And if it does, there'll be plenty of time to evaluate why while coordinating the cleanup.
This is definitely a better solution and the _only_ time force push is okay.
(Well unless this is a big public repo that have since moved on)
I agree. In fact I’d even say they exist but yet they didn’t win.
I have often wondered why a tool that sits atop of git doesn’t exist. One that exposes some basic opinionated operations.
I believe that the issue that you ran into would have been best solved with a reset and a force push, as long as you see it rightaway and thus before other commits were done and as long as you can warn the whole team that an error happened and that it will be fixed with a reset+force push. At least that's the way I would do it. We use bitbucket and disable reset+force push on the main branch, but allow it on feature branches. When it is really needed, we temporarily enable it on the main branch to do a fix and immediately disable it again.
Also, IMHO, if you have a main and development branch, I think that the feature branches should be based on the development branch and not on the main branch. Normally you want the feature branch to take into account the most recently merged code. To me it seems that if you want to make a new release of 'main' before the release of 'development', you are talking about a hotfix and that should be exceptional. For a hotfix, you'd base the fix off the version with the bug, and then check whether the fix is also needed on development. Moving commits around like that with git can be done fairly easily using 'cherry-pick'. But mainly my point is: all the different branches that you define and how you work with them is really your own choice, you can make it as hard or as simple as you want. It seems a lot of people are making that setup very complex, but remember that not everyone is working on a project used by 10 customers with 3 active major versions, each of which requiring features and bugfixes. In a lot of cases, you can get away with one long-living main branch and short-living feature branches. As long as each release is properly tagged, you can make a bugfix or hotfix release for anything that you released before. And, again in my opinion, the default should be that when a feature is merged, it means that it was ready for release and will be in the next 'regular' release.
> Surely we can do better than this? A tool being powerful should be no excuse for it having footguns!
I don't really understand what you mean here. Git is very powerful and allows you to manipulate commits in a lot of different ways, some of which are destructive... but that is part of its power. What are in your opinion the footguns and how would you remove them?
What I did notice, is that a lot of people just seem to refuse to take the time to actually learn how git works and how they can work with it. A lot of people seem to be convinced that it is a much better use of their time to search for a git UI that hides all the complexities and shows them a more simplified presentation that likely is a bit closer to how they are used to work with other source control systems, than that they use the same time to learn to use git. Personally, I use the git tools that come with git: the git cli, gitk to visualize history and git gui to commit work. (I do like the UI built into JetBrains IDEs because it is pretty good and doesn't try to hide anything or invent a new name for certain git terms.)
So several steps to implement a feature are mixed up into a single change :/ That looks awful to me, when you need to read the history to understand how and why.
Commits are the first thing by default. If you squash and rebase them, they become the second thing (feature changes). But in doing so you throw out information about the original history of those changes.
This has always seemed silly to me. Git should just be changed to support both work flows. Keep commits as-is but let me mark a range of commits as being part of feature X, and let me browse the repo from the perspective of those larger feature objects. Or make it so merge / squash commits still reference the individual commit sequence, so I can see what happened in more detail if I want to.
That's a problem I avoid by spicing the log with commit messages from http://whatthecommit.com/ ! :)
Always rebase before pushing for review, just don't squash everything. One would hope that much would be self evident.
But even then, when you want to revert or cherry-pick a commit on another branch, you don't want random unrelated changes (which increase the risk of conflicts).
Also, a single commit containing 10 squashed commits doesn't help for git bisect.
That would fix a lot of problems imho
Consider the following:
[8 files changed] ISSUE-541 added DB migrations and repositories for the new functionality of managing Foos
[2 files changed] ISSUE-541 refactor the DB migrations to do Bar before creating one of the views because of it breaking otherwise due to Baz
[11 files changed] ISSUE-541 refactor old services to use the new Foo functionality, because it should be more consistent than using native queries with complex SQL
[3 files changed] ISSUE-541 fix problems with fetching data in some of the old services, because edge cases were not covered, add unit tests to cover this functionality
[7 files changed] ISSUE-541 add services for managing Foos, though they will only be used in ISSUE-544, add comments about this
[20 files changed] optimize the imports in the Foos package and remove unused code (should automate this later)
And then the following: [3 files changed] DB migrations
[1 file changed] refactor
[4 files changed] refactor
[1 file changed] fix code, add tests
[2 files changed] services
[6 files changed] formatting
In one of the cases, the changes are larger and therefore more meaningful descriptions of what each commit does are probably a good thing - if the refactoring of code would break anything that wouldn't be immediately apparent but would later be detected, being able to click in the commit in the IDE history and see what exactly was changed as a part of it could be pretty useful!Whereas in the other case, there are fewer files changed, so the pattern that i've seen emerge is that most people won't care much about detailed commit messages, because of which it is no longer that useful.
Of course, i've also seen people (the majority of my coworkers, though it depends on the company) not really care about commit messages or even making smaller atomic commits at all, thus a feature branch could look like the following:
[28 files changed] ISSUE-541
[11 files changed] refactoring
Then again, the majority of people in my current company also always leave merge request descriptions completely empty and expect you to figure out what the code does by its contents alone in the diff, without providing any context or further considerations. Personally, i'm against this and while i don't want to sour our relationships by nagging about it, personally i always write out a bit of information about what each feature branch accomplishes, as well as even include a few images or GIFs.That has made my own life much easier when something seemingly breaks 9 months down the line and no one has any idea why, whereas i can just look at the merge request for the offending code and see all of the charts and explanations i need.
Personally, i think that the closer your documentation (of any sort) is to your code, the better the end results.
Instead think of the master branch as the integration branch. This is where everything gets merged ready for a new release. But the master branch itself is not a release. You can automatically deploy the master branch to a staging or "next" environment if you wish.
For releases, use tags. That's what they are for. If necessary you can make one or more "maint" branches where you backport important fixes from the master branch on to release branches to create patch level release versions.
When it comes to releases maintenance branches like you suggest are a lot more flexible and allow you to support old versions. Releasing v1.1 after v2 doesn't really work if all your releases have to be a merge to master.
What I was taught almost 20 years ago at Sun Microsystems (RIP) is a rebase workflow. That was back before Git. We were using Teamware of all things, and we were using it with a clean, linear upstream history workflow.
Our particular rules were:
- linear history upstream
- absolutely no merge commits (at Sun
they were called "merge turds")
- one commit per-bugfix, though one
commit could fix more than one bug,
with a separate commit for test
changes
- one commit per-project
- but otherwise one push could push
many commits
We also had rules about commit titling, naturally.Sun had been using that workflow since 1992 as I recall, so they used that workflow for 16 years, with several thousand developers pushing to OS/Net (core of Solaris).
We even had rebase --onto.
The workflow for projects went like this:
- devs push to a project clone of the upstream
- gatekeeper takes care of build issues and
preps to push to upstream when project is ready
- every so often the project repo ("gate") would
rebase onto the upstream head, then devs would
rebase their clones onto the new project head
- eventually the project would rebase onto and
push to the upstream
- project repos ("gates") got archived when the
projects completed
That workflow scales very very well. The resulting upstream's history is very clean, with just: commits for bug fixes, commits for tests for bug fixes, commits for projects, commits for release-making, and the occasional follow-up to a commit that fixed minor issues with that commit (e.g., `12345 Crash in blah blah (fix style)`).I strongly recommend it.
Many day-to-day problems with Git all have something in common which is that the user has got their repo in to some state they do not understand. And most times (as comments have mentioned) they don't even know what they did to get themselves in their current pickle.
The quickest path to resolution is usually a hard reset to the server version of the branch then restore the commit(s) that user had made.
If you have pending changes then handle those first. It doesn't matter much what they are just commit or stash. If committing don't even think about the message just use `wip commit`. Finally, note every commit id that has your work you want to keep around. Might be 1, 5 or 20 commits.
Okay, now hard reset.
`git reset --hard <remote_name>/<branch_name>`
Great, now you're at an acceptable state.
Finally, restore your previous commits use cherry-pick or `git stash apply` to get your stuff modified.
If cherry pick then it's `git cherry-pick <sha_oldest> <sha_next_oldest> ....`
On to the next problem...
1. Commit your your local changes, squash if you want.
2. Do a pull --rebase
3. Push
If this doesn't work, it's your own fault. Start over.
You can pop or just apply a specific stash using
git stash { pop | apply } stash@{N}
This has worked as far as I remember; I have a 1.6 installation somewhere where I can confirm it, if necessary."Changes print messages again"
How helpful is that? In what way were they changed? What was the intention behind the change? Is there a ticket or a feature request behind it? Why was that particular change chosen instead of all the other ways to achieve the same effect?
It's also considered prudent to keep to a common format. These are even written in differing tense, where imperative is by far the most commonly preferred.
I use it continuously throughout any workday, if I'm doing contributions that day.
It's also a good idea to set your pull strategy to rebase, as recommended here: https://sdqweb.ipd.kit.edu/wiki/Git_pull_--rebase_vs._--merg...
git config --global pull.rebase trueI've seen cases where people don't care in the slightest about the list of commits in any PR/MR, but instead just look at the diff view to see the code changes.
At least that's how it's been in every single project that i've ever been a part of. Most also don't use rebasing at all and don't do squashing commits into a single one for that particular PR/MR either.
That said, there is probably merit to using rebase and squashing as well, though many fail to see it or gain anything useful from it.