No Reason To Squash

Arialdo Martini — 28/11/2022 — git

TL;DR

  • Even without squashing, Pull Requests already include a squashed commit
  • You can get an on-the-fly squashed view of history using --first-parent
  • Squashing would not save space
  • Squashing makes git bisect less effective
  • Once squashed, details are lost forever
  • Joining is easier than separating
  • Squashing may promote sloppy habits
  • Don’t squash, rebase

• Squash! 1-commit PRs are easier to review

If you squash, Pull Request reviewers will just have to read one single commit. Easier.

‣ Except that PR contains already a squashed commit

You don’t have to squash: the merge commit already contains the whole branch, squashed.

The merge commit contains the whole branch, squashed

When the PR is not squashed, you can review both the final result and each single step. You can comment, amend, or exclude each single commit. Still, you can see the PR in one single, unified change from the merge commit.

On the contrary, if the PR is squashed, you just have the final result, and all the single steps are lost forever.
Hard to expect painstaking precision when details have been molten.

• Squashing makes history cleaner

A history like:

Squashed git history

is cleaner than one displaying all the single commits of each pull requests:

Not squashed git history

‣ Except that you can get the same, on-the-fly

You don’t need to squash to hide details. Just use --first-parent.

SmartGit's follow-only-first-parent option

This works from the command line using --first-parent:

result of git log --first-parent

and with some Git GUI clients such as SmartGit and Magit. Not all the Git frontends support --first-parent, though.

Side note

“[T]he problem isn’t the extra information: it’s that the information isn’t displayed in a way that shows them what they’re interested in” (David Chudzicki). In other words, making the history clean is mostly a matter of data display, not data collection. You can store all the details and still be able to only show the merge commits.

That’s why Git provides the option --first-parent in the first place.


• At least squash after the PR merge! Nobody will need the single commits after

Ditto. Who cares?

‣ Except: good luck using git bisect

You will probably regret having squashed the history the next time you troubleshoot.

git bisect is your best friend when searching which commit introduced a bug: if the devs stick with the good habit of commiting early, often and small, git bisect will have all the chances to return you the very specific line of code containing the issue.

With squashed and large commits, you are left alone troubleshooting by hand.


• Squashing cleans up WIP and temporary commits

The reviewer cares about the net effect of the PR, not about the half implemented commits, the broken ones that not even compile, the fixed typos, the amendments and the like.

‣ Except that this promotes sloppy behaviors

Don’t commit broken code in the first place.

Conscientous developers do review their work before submitting a pull request, and each and every of their commits builds, has green tests and is potentially deployable.

Git offers the scrupulous developers all the tools for tidying up their commits

  • commit --amend and fixup for amending commits
  • rebase --interactive for deleting, reordering, squashing commits

There is really no excuse for pushing a pull request with not-compiling commits.

If the policy can be read as:

Don't worry, no matter the mess, all your commits will be squashed into one

you can be sure that no one will break their backs for avoiding the mess.

I saw this happening: mandatory squashing rules eventually translated to tolerated sloppy habits.


• Squashing saves disk space

If you don’t squash, all those commits will knock Git down!

‣ Except it doesn’t

Reducing Scala repository (38,098 commits) to one (1) single commit just saves 47% of space:

Try yourself:

repo=scala
squashed=${repo}-squashed

rm -fr ${repo} ${squashed}

git clone https://github.com/scala/${repo}.git
cd ${repo} && git gc --prune --aggressive
cd ..

mkdir ${squashed}
cd ${squashed}

git init
git fetch --depth=1 -n ../${repo}
git reset --hard $(git commit-tree FETCH_HEAD^{tree} -m "initial commit")

git gc --prune --aggressive

cd ..
du -sh ${repo} ${squashed}

The whole Scala repository squashed to a single commit


• I’m telling you: squash! Look how awful

All valid arguments. But the reality speaks for itself. That’s the Scala repository, which does not use squashing:

The Scala repository

Look instead how Typescript went from not squashing (on the left) to squashing (on the right):

The Typescript repository, before and after squashing PRs

‣ Except, there are saner workflows

In all honesty, if the alternative to squashing is having horrible Git histories like those, I’m all for squashing.

But there’s a reason why they are so convoluted: in those repositories PRs are merged without rebase. When PRs are rebased before merging, the result is like the Haskell Cabal’s repository:

The Cabal repository

With a sane and disciplined workflow, it’s not hard to have both all the details and a clean history.

But this deserves a separate article.

References

Comments

GitHub Discussions