If there are no conflicts, you might as well rebase or cherry-pick. If there is any kind of conflict, you are making code changes in the merge commit itself to resolve it. Developer end up fixing additional issues in the merge commit instead of actual commits.
If you use merge to sync two branches continously, you completely lose track of what changes were done on the branch and which where done on the mainline.
Merge is perfectly fine and it is the only way to synchronize repositories without changing the history, which is very important for a decentralized system. It certainly has the potential to make a mess if used improperly, but so do rebase, cherry-pick, and basically every other command.
> If you use merge to sync two branches continously, you completely lose track of what changes were done on the branch and which where done on the mainline.
If you do things correctly, that is by making sure that when you merge changes from a feature branch into the mainline, the mainline is always the first parent, you shouldn't have any problem. Git is designed this way, so normally, you have to go out of your way to mess things up. If did it like that and you don't want to see the other branch commits, git-log has the --first-parent option.
Even though I much prefer a linear history, losing 1h or more to the tedious work of re-resolving the same conflict over and over is not worth it, in my opinion.
It seems like a very contrived example to me. We have been running rebase/fast-forward only for close to 10 years now, and I have never experienced anything that unfortunate.
Rebase is being annoying here mostly because it's doing exactly what you want it to do: warn you about merge conflicts for every commit in the chain that might have any.
Suppose "masterX+1" is called latest
Suppose "masterX" is the SHA of your mergebase with master (on top of which you have 10 commits)
`git rebase --onto latest masterX`
I wish it works like merge, or exist a way to merge, resolve conflict, rebase?
Both tools are pure vandalism compared to merge. Among the two, cherry-picking is preferable in this case because you're "only" destroying your own history, so in the end, it's your funeral.
> Developer end up fixing additional issues in the merge commit instead of actual commits.
A merge commit IS an actual commit, in every sense of the word. The notion it somehow isn't, is what you need to get rid of.
I do agree that resolving conflicts in merges is risky though. It can make sense when merging just one way between permanent branch (e.g. a 1.x branch into a 2.x), but as soon as cross merges become a possibility it’s probably a mistake.
How do you do otherwise, though? Or is your workflow a combination of rebases and merges? Continual rebasing of the feature branch onto `main` and then a final merge commit when it's ready to go?
If you do have multiple devs working on the same branch, use `git pull --rebase` to stay in sync with each other, don't use merges and leave lots of merge commits. If you need to resolve conflicts with upstream, make sure other people have stopped working on the branch, rebase it, then merge.
Rebasing takes longer and is actually more prone to error because of the clunky interface. There is absolutely nothing wrong with squashing commits in a feature branch and merging that into master/main. In fact, it's generally better for the health of the repo and the mental health of developers.
* If you screw up a merge, you undo the merge commit. Now your branch is exactly as it were. May not happen with a rebase.
* If you push some code to the remote, and later find out it was outdated, you can merge it with main and push again: no need to force, github can distinguish what's already been reviewed and what hasn't. With rebase, you may need to push -- force, and if someone already reviewed the code they're going to be shit out of luck, as github will lose the capability to review "changes since last review", as the reference it has may have been lost.
I also merge these features using squash commits, which provides a very linear history. This also saves some effort (you don't need to be rebase the commits in the feature branch, which can be a pain in the ass for unorganized people and git newbies, and you are pushed towards making smaller, granular PRs that make sense for the repo history).
I like cherry-pick, but I barely use it (e.g., I need to cherry-pick one commit from branch X into my branch). I don't like rebase much because it requires force-push.
This way, you know which set of commits was in the branch by looking at the parent commits of the merge commit, but the merge commit itself did not involve any automated conflict resolution.
I don't get it. If you rebase, you get 20 chances to do the same.
Rebasing is the process of redeveloping your feature based on the current master. This is smaller, easier steps to review later.
It is a pitty that we can't have tooling to create "hidden" merge commits to allow to connect rebased branches, this would retain the history better and allow pulling more easily.
Also, a way to "rebase" that works the same as cherry picking commits on top of the target. As far as I can see, the regular rebase works it's way up the target branch, so that I end up resolving conflicts in code that eventually changed in the target.
As long as the merge commit is being reviewed with the rest of the PR, that's fine, right? (We use rebase while working on feature branches, and then squash & merge for completed PRs, which seems to be the best of both worlds)
If you do things the way you're suggesting, you'll make it really hard to tell what commits were made on your branch. Git clients tend to assume the first parent is the branch you care about.
We performed code review with a projector in our office jointly looking at diffs, or emacs.
Of course it’s neat to have GitHub actions now and pull-requests for asynchronous code review. But I learned so much from my colleagues directly in that nowadays obscure working mode which I am still grateful for.
We did have an ugly plush animal, but it served more obscure purposes. For blame of broken builds, we had an info screen that counted the number of times a build had passed, and displayed below the name of the person who last broke it.
Explaining to outsiders and non-developers that "Yes, when you make a mistake in this department, we put the person's name on the wall and don't take it down until someone else makes a mistake" sounds so toxic. But it strangely enough wasn't so harsh. Of course there was some stigma that you'd want to avoid, but not to a degree of feeling prolonged shame.
On another team I was on, in 2002 using CVS, we had an upside-down solo cup as a base for a small plastic pirate flag. If you were ready to commit, you grabbed the pirate flag as a mutex on the CVS server. Of course, this turned competitive… and piratical.
I despair about long-lived git feature branches and pull requests. The pull request model is fine for open source development, but it’s been a move backwards for internal development, from having a central trunk that a team commits to several times a day. The compensating factors are git’s overall improvements (in speed and in principled approach to being a content addressable filesystem) and all of the fantastic improvements in linters and static analysis tools, and in devops pipelines.
It's specially cool given that he would always see his employees' f*k-ups as learning opportunities. He would always teach them what went wrong and how to fix it before shaming them in the git history. He always told them he did it to assure they wouldn't forget both the shameful f*k-up + the bit of learning that came along with it. They always laugh it off and understand the boss' intentions. It isn't harsh or anything.
With good infra, everything from unit tests to integration to acceptable tests get ran before code hits main.
The only excuse for builds breaking nowdays. is insufficient automated safeguards.
In-person code review is the only way to do it. Pull requests optimize for the wrong part of code review, so now everyone thinks it's supposed to be a quality gate.
Most corporate code bases are written by a smallish team operating under tight time constraints so most contributions are actually improving on the current state of the code base. Then PRs delay the integration, and lead to all kinds of follow up activities in keeping PR associated problems at bay. For example the hours wasted by my team in using stacked PRs to separate Boy Scout rule changes to the code from the feature is just abnormal.
Look up trunk based development and read the continuous integration book published by Addison Wesley (Is it the hez humble book or the Duvall book I always confuse the authors, both books are great though).
The hard part will be to convince people of exploring a different way working mode AND to learn that what is proposed is not an anarchist style of development but a development model that optimizes on efficiency
All PRs are rebased and merged in a linear history of merge commits that reference the PR#. If you intentionally crafted a logical series of commits, merge them as a series (ideally you've tested each commit independently), otherwise squash.
If you want more detail about the development of the PR than the merge commit, aka the 'real history', then open up the PR and browse through Updates, which include commits that were force-pushed to the branch and also fast-forward commits that were appended to the branch. You also get discussion context and intermediate build statuses etc. To represent this convention within native git, maybe tag each Update with pr/123/update-N.
The funny thing about this design is that it's actually more similar to the kernel development workflow (emailing crafted patches around until they are accepted) than BOTH of the typical hard-line stances taken by most people with a strong opinion about how to maintain git history (only merge/only rebase).
The kernel needs a highly-distributed workflow because it's a huge organization of loosely-coupled sub-organizations. Most commercial software is developed by a relatively small group of highly-cohesive individuals. The forces that make a solution work well in one environment don't necessarily apply elsewhere.
With this, you can also push people towards smaller PRs which are easier to review and integrate.
The downside is that if you és o work on feature 2 based on feature 1,either you wait for the PR to be merged in main (easiest approach) or you fork from your feature branch directly and will need to rebase later (this can get messier, especially if you need to fix errors in feature 1).
This is a tooling issue that needs to be solved client-side (i.e. where the signing key lives). It's an important one but actually really simple.
> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that".
The best way IMO is to interactive-rebase the branch locally (or force-push a rebased version later), but sometimes 50 commits merge into a 30-ligne single-file change and nothing beats squash.
How are you going to deal with non-trivial feature branches that need to be integrated into master? Squash them and commit? Good luck when you need to git bisect an issue. Or rebase and potentially screwing up the integrity of the unit test results in the rebased branch? Both sound unappealing to me.
The problem is not a history with a lot of branches in it, it is in not knowing how to use your tools to present a view on that history you are interested in and is easy for you to understand.
To me this is like saying to a construction worker: “The problem is not that your hammer has sharp spikes coming out of the handle at every angle. The problem is that you don’t put on a chain mail glove when using it.” That’s certainly one way to look at it.
IIRC, GitHub uses a development model where partially implemented features are actually deployed to production, but hidden behind feature flags.
I'm pretty sure the point is that this is a one-person project and the author can play around. He's not suggesting your team of 100 people to adopt this for the development of your commercial product.
When you are using linear histories and rebasing you don't do monolithic feature branches. You land smaller chunks and gate their functionality via some configuration variable. `if (useNewPath) { newPath(); } else { oldPath(); }` and all your new incremental features land in `newPath`. All tests pass on both code paths and nothing breaks. When the feature is fully done then you change the default configuration to move to the `newPath`.
> How are you going to deal with non-trivial feature branches that need to be integrated into master?
That's the point -- this isn't a thing in rebase workflows. That's a feature. You don't have to deal with megapatches for massive features. It's incrementally verified along the way and bisect works flawlessly.
I wonder about performance, though. Why is the author's method slower than the package I linked?
I've been using githashcrash [1], but it's only running on the CPU, which is why it's a bit slower. :-)
Of course if the date only has seconds resolution it may be to big of a shift to be reasonable.
Also exploratory branches where any nonsense may go on (that may end up being merged, at least partially!). Also test/development vs. production branches! One may be broken, the production branch should ideally never be in a state that cannot be deployed.
That said, keep the branches limited and try to keep them 'linear' in the sense that you don't want to be merging between 100 different non-main branches in some byzantine nightmare. Perhaps encourage merges only to the development branch and then rebranching.
Well, why don't you simply copy the code into a new directory and commit that? Then you can do whatever you want in the scratch directory.
But isn't this bad practice? My grug brain refuses to commit anything that does not pass tests. Check tests, then commit. Check tests, then commit.
You can hide your as yet incomplete feature inside an undocumented option, and work from there, without breaking anything.
This is an ideal case, of course.
Examples:
"shit show 14" gets converted to "git show 00000140"
"shit log 10..14" translates to "git log 00000100..00000140"
[1]: https://github.com/zegl/extremely-linear/blob/main/shit
Shouldn't "shit show 14" get converted to "git show 0000014"?
In the hook:
prefix=whatever
old=$(git rev-parse HEAD)
new=$(brute force $prefix)
git update-ref -m "chose prefix $prefix" --create-reflog HEAD "$new"
Of course, it's pretty silly and slow.I mean: Imagine going back in time 20 years to when git, hg, and bzr were created and telling the creators of those tools: "Hey, while designing your technology, you should be aware that it'll end up being used as a worldwide centralized monorepo run by Microsoft, and no one will ever use any of that distributed stuff."
They'll either laugh you out of the room or you'll be in trouble with the Department of Temporal Investigations for polluting the time line, because what we currently understand as git sure as hell won't be the design they'll come up with.
So for me: I prefer centralized. And SVN is just a reasonable one to use.
... and, very notably, the hash of the parent commit. That is also part of the commit, which means that changing a parent commit would also imply changing the hashes of all later commits. This is sort of the whole point of git/version control.
For me on a Ryzen 5800HS laptop, lucky_commit generally takes 11–12 seconds. I’m fine with spending that much per commit when publishing. The three minutes eight-character prefixes would require, not quite so much.
What I’m actually is doing is generating a 7-digit incremental number followed by a fixed 0. Some UIs show 7 characters and some show 8, this felt like a nice compromise. Plus it’s easier to distinguish between the prefix and the suffix when looking at the full SHA when they are always separated by a 0.
It's a combination of the "repo size" (as in, estimated number of objects) and a hard floor of seven characters.
You can see this by running "git log --oneline=7" on any non-trivially sized repository (e.g. linux.git). There's plenty of hashes that uniquely abbreviate to 7 characters, but they're currently all shown with 12 by default.
$ git init x
Initialized empty Git repository in /tmp/x/.git/
$ cd x
$ git commit --allow-empty -m one
[master (root-commit) 4144321] one
$ git log --oneline
4144321 (HEAD -> master) one
$ lucky_commit
$ git log --oneline
0000000 (HEAD -> master) one
$ git commit --amend --no-edit --reset-author --allow-empty
[master 3430e13] one
$ git log --oneline
3430e13 (HEAD -> master) one
$ lucky_commit
$ git log --oneline
0000000f (HEAD -> master) one
$ git reflog --oneline
0000000f (HEAD -> master) HEAD@{0}: amend with lucky_commit
3430e13 HEAD@{1}: commit (amend): one
00000005 HEAD@{2}: amend with lucky_commit
4144321 HEAD@{3}: commit (initial): one
$ git reflog expire --expire=now --all
$ git reflog --oneline
$ git log --oneline
0000000f (HEAD -> master) one
$ git gc --aggressive --prune=now
Enumerating objects: 2, done.
Counting objects: 100% (2/2), done.
Writing objects: 100% (2/2), done.
Total 2 (delta 0), reused 0 (delta 0), pack-reused 0
$ git log --oneline
0000000 (HEAD -> master) oneIn order to get this 'beautiful' hashes, they're crunching numbers leveraging cpu power?
> but it can also mean to only allow merges in one direction, from feature branches into main, never the other way around. It kind of depends on the project.
That sounds like the Mainline Model, championed by Perforce[0]. It's actually fairly sensible.
[0] https://www.perforce.com/video-tutorials/vcs/mainline-model-...
At risk of coming across as a humorless Hacker News commenter, I will add that I enjoyed this post. It’s a neat hack!
Where in the worst dystopian parts of software do we do this?
The SHA1 is kind of a security feature if anything, a side-show thing that should be nestled 1-layer deep into the UI and probably most people are unaware of.
Whereas commits and branches should be designed specifically for the user - not 'externalized artifacts' of some acyclic graph implementation.
Git triggers a product designers OCD so hard, it's hard for some of us to not disdain it for spite.
A SHA-1 might not look friendly to a dev who doesn’t understand it, but as someone who works with hash values all the time, having my repo be a Merkle tree gives me a warm fuzzy.
Your 'warm and fuzzy' comes at the cost of confusion (even to yourself), not having any clue what the information really means.
It's not even clear that it's a commit, it could be anything.
This posture is exactly what I'm complaining about: it's objectively bad design engineering, embraced as though somehow it's 'smart'.
Git has a few problems like this.
They aren't perfect, of course. All they indicate is in which order the current clone of the repo saw the commits. So two clones could pull the commits in different order and each clone could have different revision numbers for the same commits.
But they're still so fantastically useful. Even with their imperfections, you know that commit 500 cannot be a parent of commit 499. When looking at blame logs (annotate logs), you can be pretty sure that commit 200 happened some years before commit 40520. Plus, if you repo isn't big (and most repos on Github are not that big by numbers of commits), your revision numbers are smaller than even short git hashes, so they're easier to type in the CLI.
If all hashes were prefixed with "h", it would have been so simple to add another (secure) hash and a serial number.
E.g. h123456 for the sha1, k6543 for sha256 and n100 for the commit number.
https://git-scm.com/docs/git-gc#_configuration
Git does something called "packing" when it detects "approximately more than <X (configurable)> loose objects" in your .git/objects/ folder. The key word here is "approximately". It will guess how many total objects you have by looking in a few folders and assuming that the objects are uniformly distributed among them (these folders consist of the first 2 characters of the SHA-1 digest). If you have a bunch of commits in the .git/objects/00/ folder, as would happen here, git will drastically over- or under-approximate the total number of objects depending on whether that 00/ folder is included in the heuristic.
This isn't the end of the world, but something to consider.
I imagine stuff like this and SVN to Git mirroring to work nicely with identical hashes.
It’ll undoubtedly be easier to further expand, but it’s nowhere near pluggable.
> No merge commits are created.
> Fast-forward merges only.
> When there is a merge conflict, the user is given the option to rebase.
The maintainer can enable this for a project.
Kudos to @zegl for this cool project.
I'm still pondering the “almost” ;-).
Brute-forcing hash collisions seems like an April Fool's joke. You can't really be serious that people are going to do this regularly?
I emulate this by counting the number of merges on main:
git rev-list --count --first-parent HEAD
But it's not that traceable (hard to go from a rev back to a commit).
This way, a correctly configured git client (which pulls those refs) can use `git checkout r/1234` to get to that revision. It's also noteworthy that this is effectively stateless, so you can reproduce the exact revisions locally with a single shell command without fetching them from the remote.
The revisions themselves are populated in CI: https://cs.tvl.fyi/depot@c537cc6fcee5f5cde4b0e6f8c5d6dcd5d8e...
I find all this hash inverting quite inelegant.
Hah :D
I have a somewhat related interest of trying to find sentences that have low Sha256 sums.
I made a go client that searches for low hash sentences and uploads winners to a scoreboard I put up at https://lowhash.com
I am not knowledgeable about gpu methods or crypto mining in general, I just tried to optimize a cpu based method. Someone who knows what they are doing could quickly beat out all the sentences there.
Wraparound doesn't really matter, as long as it's spaced long apart.
0<suffix>
10<suffix>
20<suffix>
...
Combined with auto-completion, you preserve the main advantage (ordering) and you are able to quickly compute the hash.But! How can I collaborate with my team when PR merges are inevitable? O:
I honestly expected this to be from another "really cool date" - April 1st :D