Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I reject the argument that a no-rebase, merge-only history "preserves the true history of how commits were created", because I believe that is irrelevant. What is relevant is what the tree looks like once the merge (or rebase) lands.

That's kind of the point though: being reasonably sure that a commit contains a tree that the committer had seen at some point, instead of making up history with commits that contain trees that the committer never saw at any point at all.

When someone rebases `n` commits, experience has taught me I can't trust any commits other than `HEAD`; chances are any commit printed by `git log "HEAD~${n}..HEAD^"` was never checked out by anyone, much less tested at all.

CI pipelines also usually run only against HEAD at the moment of push, so if someone pushes `n` commits, then `n-1` are usually ignored by CI pipeline.

Modifying options for compiler or linter or formatter checker; adding a new dependency or updating an existing dependency's version; changing default options for the project. Stuff like that might make those commits useless, and if someone notices a problem in HEAD after rebase, and decides to fix it, even if the fix is moved to the earliest possible point, nobody would bother re-testing all those n-1 commits after the fix was added, leaving broken commits useless for git bisect.

So I agree that rebase is nice. How most people use it, though, not so nice.



> That's kind of the point though: being reasonably sure that a commit contains a tree that the committer had seen at some point, instead of making up history with commits that contain trees that the committer never saw at any point at all.

I don't really understand why this would be important. If I'm the one committing, I can rebase however I want to rewrite history before merging, so if I'm super adamant that a commit that looks a certain way exists, I can just make that commit and then put commits around it as needed to ensure that it can be merged with by fast-forward to preserve it. If I'm not the one committing, why should I care about what intermediate states that the person who committed them don't even care about enough to preserve?

To me, the issue seems more that the UX for doing this sort of thing is not intuitive to most people, so the amount of effort needed to get the history rebased to what I described above often ends up being higher than people are willing to spend. This isn't a particulary compelling argument to me in favor of merging workflows though because it doesn't end up making the history better; it just removes most of the friction of merging by giving up any semblance of sane commit history.

(edited to add the below)

> When someone rebases `n` commits, experience has taught me I can't trust any commits other than `HEAD`; chances are any commit printed by `git log "HEAD~${n}..HEAD^"` was never checked out by anyone, much less tested at all.

I definitely agree that generating broken commits during a rebase is not a good thing for anyone, and I'd be super frustrated if I had teammates doing that. At least personally, I make sure to compile and run unit tests before continuing after each step of a rebase after I've fixed conflicts; there's even the `x` option in an interactive rebase to execute a command on each commit (which will halt and drop into commit and allow you to amend before continuing if it fails), which is unfortunately not super well known.


> I don't really understand why this would be important.

It is important because not everyone does this:

> [...] and then put commits around it as needed to ensure that it can be merged with by fast-forward to preserve it.

Good quality rebases like those are more likely to happen on patch-based workflows (not necessarily email), compared to PR-based workflows, because there's more focus on the individual commits themselves being meaningful, with straight line history being mostly a nice side-effect. With "more likely" I mean literally that, more likely; I'm not saying it only happens there.

In PR-based workflows on the other hand, people tend to care only about HEAD. PR color is green? LGTM ship it :rocketemoji:. Most just read blog post by git shaman saying straight line pretty and then go to GitHub and enable the setting for that without thinking more than that; or learn that you can reorder commits to tell pretty story and do it without thinking more than that.

Though it's also true that some repository owners only care about the tagged commits; all untagged commits could be broken and they don't care because "it's supposed to be in progress" and "as long as the most recent commit works, it's fine". They've never needed to checkout any specific commit on any repository (understandable if they never contribute to others' repositories).

---

Also, you probably noticed already because of your edit, but re:

> If I'm not the one committing, why should I care about what intermediate states that the person who committed them don't even care about enough to preserve?

With "intermediate states" I don't mean what other people committed; I mean all your own commits that you just rebased (all your own commits whose hash changed) that are not the most recent one.

You are in the minority that fixes those; most people I've met would be like:

* All commits are tested and work fine.

* Create PR.

* See CI fails because branch is outdated.

* Rebase PR onto most recent commit of main branch.

* See CI fails because, idk, let's say it's something easy to fix like a more strict linter config.

* Make a new commit that fixes the linter errors.

* CI passes.

* Everyone LGTM's the PR and it gets fast-forwarded.

* The PR had `n` commits, but now `n-1` of those fail the linter because they contain the new config for the linter, but the committer never bothered to look at those commits, they only cared about HEAD. Those `n-1` commits "contain trees that the committer never saw at any point at all" (copy-pasting that quote from my message). And it doesn't matter that those commits are broken because for those people having pretty straight line is way more important than a working commit.

The recent FreeBSD/Netflix thingy[1] had a successful bisect only because when people rebase stuff in there, they don't YOLO those `n-1` rebased commits. If that had been any of my previous workplaces, or anyone who only rebase because "straight line pretty" without thinking anything more than that, then that whole bisect could have gone way worse.

[1]: https://news.ycombinator.com/item?id=40630699


All of your points seem accurate to me, but I don't see how merge workflows fix any of it. It seems like the same thing could happen where each commit along the way is broken until the final one, and then it's merged as-is. I don't think that having those intermediate commits being the exact ones that the person made is a solution because the problem you're describing is social, not technical; people not caring about committing messy intermediate state to the repo isn't going to be fixed by using merging rather than rebasing. The only workflow that would eliminate the problem entirely is to completely remove all intermediate state by squashing to a single commit before any merge, at which point doing a merge versus a rebase won't matter.


Neither workflow fixes anything. Each strategy helps with some things, but require discipline in other things.

Using merges lets you commit as you go, without needing to go back to repeat a test on a previous commit, and only worry about conflicts at the end of your development. Write code, test, commit. Write more code, test, commit. Cherry-pick, test, commit. Merge into main, fix conflicts, finish merge. There's never a need to go back and re-test, like with rebase, because the commits that were already tested are still there. But they require discipline to not pollute history, and being open to squashing commits that don't add any useful information (you want to avoid having "WIP"-style commits).

Using rebases lets you rewrite commits to take advantage of the most recent changes from the main branch, instead of waiting until you finish with your feature. But they require discipline to go back and repeat tests to ensure that any commit that changed still works as expected (and it's needed because the commits changed, hence their different hash, so they are no longer the commit hashes that were tested), and being open to having some merge commits (you want to avoid rebasing a 10 commit migration of your telemetry library because if 3 months later you find out your costs in production were way higher than what they told you they would be, reverting a single merge commit is more dumbproof compared to reverting a manually provided range of commits).

So yes, choosing one or the other is a social problem. Both are good solutions with good discipline, and both are bad solutions with bad discipline. One of those makes it less likely for people in my bubble to make a mess out of that repo. It might be the same as for your bubble, or it might be different.

But on a good project it doesn't really matter which one is done.


> So yes, choosing one or the other is a social problem. Both are good solutions with good discipline, and both are bad solutions with bad discipline. One of those makes it less likely for people in my bubble to make a mess out of that repo. It might be the same as for your bubble, or it might be different.

> But on a good project it doesn't really matter which one is done.

I appreciate your explanations! I think I understand your point of view now, and I do actually agree with it. In particular, I hadn't fully considered that the problem ultimately being social means that the "best" choice will be mostly dependent on what consensus a group is able to come to.

Thinking about this more, it almost seems like having a preference could become self-reinforcing; it's hard to be a member of a group that reaches a consensus on using merges as someone who prefers rebases (and likewise for the reverse), which over time manifests as more and more anecdotal evidence in favor of the preference working better than the alternative. It's no wonder that debates about this sort of thing become so contentious over time...


In a merge-based workflow you can have commits like "wip" or "before lunch"; no reason to believe those were ever tested either.

I like rebasing but it's ultimately up to the author. Even tools like Fossil, that don't have official history rewriting tools, don't ensure that history has never been rewritten because people can use external tools to do the rewriting (and I've done this).


Use a temporary branch for those. When you come back, undo the commit (git test -- hard if memory serves but i just have an "uncommit" alias) and commit the fully finished work to the real branch.


This is destroying the "real history" though. (Which again, I'm fine with, I like rebasing.)

Two months from now I'm quite likely to say something like "oh yeah, I remember I encountered a bug related to that, I was trying to fix it before lunch". The "wip" and "before lunch" commits are just as likely to be relevant in the future as any other.

It's nice to assume that all commits will compile and pass the tests, but it's sometimes useful to have a snapshot of that weird compiler error you encountered. So much for our nice assumption.

This is why I say it's all up to the author, and if the author likes rebasing, I don't think anyone should have a problem with that. (Don't rewrite public branches, of course.)


There's levels of granularity that matter. You could just as well record all your edits in realtime. Make a script that makes a commit every second or every time you finish editing a line. It might be interesting later, yet that's usually not how people use git. Those changes wouldn't be meaningful units of work.

If you make a commit "wip" or "before lunch" because you want a backup of your work or want to continue on a different computer, then it's not a meaningful unit either. It's OK to throw away.

Most people prefer less granular commits but not to the point of having 1 commit per issue/PR. For example after inheriting someone else's code written in a hurry and not tested, I often end up dividing my work into several commits - first there's a cleanup of all the things that need renaming for consistency, adding docs/tests, removing redundant/unused code, etc. sometimes this ends up being more commits as i reveal more tech debt. Then, when i am confident i actually understand code and it's up to my standards, I make the actual change. This can be again multiple commits. The first and second group are often mixed.

And it's important when it later turns out i broke something - i can focus on the commits that make functional changes as the issue is usually there and not in the cleanup commits which can be 10x larger.

BTW what git is really missing is a way to mark multiple commits as one unit of work so the granularity stays there but is hidden by default and can be expanded.


> BTW what git is really missing is a way to mark multiple commits as one unit of work so the granularity stays there but is hidden by default and can be expanded.

Is that not just a non-FF'd, non-squashed merge of a branch?


This is my preferred branching model. Most forges seem to call it "semi-linear history". If you have a lot of people working on the repo you'll probably want a merge queue to handle landing PRs but that's pretty straight forward.

It works really well with things like git bisect. It also means history is actually useful.


That's the closest you get today but it means having to make, merge and delete branches all the time. What i propose is something like git squash but that keeps the history internally. It would present as one commit in gitk and other GUIs but could be expanded to see more detail.


> it means having to make, merge and delete branches all the time

Isn't this something that git makes simple?


Does gitk have an equivalent of `git log --first-parent`?


In the View menu dialog, there's a checkbox for "Limit to first parent"


> Make a script that makes a commit every second or every time you finish editing a line. It might be interesting later, yet that's usually not how people use git. Those changes wouldn't be meaningful units of work.

Every Jetbrains IDE does this, and VSCode has it's own equivalent feature. They don't use git, but same thing really. It's one of the most useful features ever IMO.


If people say "preserve history" as in "literally don't delete anything", then yeah I see where you're coming from.

I'm not against rebase, and even use it myself. But having a repo where every 3rd commit is a dice roll for git bisect just because straight line pretty, is just as annoying as people shipping their reflog.

A rebase of one commit is harmless. A squash is harmless. A rebase of multiple commits where every commit is deliberate (verifying all rebased commits, etc) is harmless.

A rebase that ignores the fact that any commit whose hash changed can now fail, is irresponsible. Shipping `wip` commit messages is irresponsible. A merge commit with the default message is irresponsible (it's no different from a `wip`-style commit). Having a branch with merge commits that could have been cherry-picks[3].

Also, to me the lie is not some aesthetic thing like commit order or some easily forgeable timestamp; the lie is having a commit that (for example) assumes the `p4tcc` driver is being used[1], and you read the diff and indeed it has assumptions that imply that driver is being used[2], but when you actually checkout that commit and see if that driver exists it turns out no it fucking doesn't, and hours were wasted chasing ghosts. Only because when that commit was created, the p4tcc driver was being used, but when you checked out weeks later now that commit magically uses the `est` driver instead.

If you're going to keep straight line, then test every change; if you don't do it, don't complain about broken middle commits.

If you're going to do merge commits, then keep each commit clean[4], even the merge commit[5]; if you don't don't complain about a history that is polluted with weird commits and looks like the timeline of a time-travelling show.

[1]: Because it did when that commit was created.

[2]: Because, again, it did when that commit was created.

[3]: This assumes the branch will later be integrated into main with a merge commit.

[4]: Squash is harmless. It's just omission. If anyone complains about purity, then just keep them happy with `git reset $COMMIT ; git add --all ; git commit -m "This is a new commit from scratch"`

[5]: Write something that helps those who use `git log --first-parent`. If you're on GitHub, at least use PR title and description as default (can be overriden on a case-by-case basis). If not, then even just "${JIRA_ID}: ${JIRA_TITLE}" is more useful than the default merge commit message while still letting you be lazy.


Yeah, exactly, it’s up to the author to determine what’s important to preserve. Note this is always true, because the author is the one who commits, and can do anything before committing. If keeping the “before lunch” commit is useful for the history, rebasing does not prevent that in any way. Personally, I doubt that particular comment really is just as likely to be useful as something describing what the change is, but I’m with you that it’s author’s choice. It seems like squashing “WIP” and “before lunch” and describing the change content & reasoning has quite a bit higher likelihood of usefulness down the road than a comment on when you planned to eat lunch, and that has been true for me in practice for many years.

There is no “real history” in git, and it’s kind of a fictitious idea, even in Fossil or other VCSes that don’t offer rebase. Think about it: commit order of “WIP” ideas in a branch is already arbitrary, and commits only capture what you committed, not what you typed, nor when you ran the build, nor what bugs you fixed before committing, nor what you had for lunch, nor anything you didn’t choose to commit. Taking away rebase only adds extra pressure to plan your commits and be careful before committing, which means that people will do more editing that is not captured by the commit “history” before committing! Having rebase allows you to commit willy-nilly messes as you go and know that nobody has to see it. It seems like rebase might very well be safer in general because it encourages use of the safety net rather than discouraging frequent messes… and we’re all making frequent messes regardless of VCS, all we’re talking about is whether we force the rest of the team to have to be subjected to our messes.

Git provides change dependencies, and does not offer “history” in the sense you’re implying. People overload the word “history”, and git’s sense of history is to show the chain of state dependencies known as commits, and those have editable metadata on them. In other words, git’s “history” is a side-effect, a view of the dependencies. Git’s “history” does usually have loose association with an order of events, but nothing is or ever was guaranteed. It is by design that you can edit them (meaning build a new set of dependencies with rewritten metadata… the old one is still there until garbage collection), therefore there is no “real history”, that’s not a real thing.


I depends on what you view as the real history. If you link each pull request to a work item you’re not going to really need all the commits on a development branch, because the only part of the history which matters is the pull request.

I think people should just use what works for them, if that’s debase who cares? The important part is being able to commit “asd” 9 billion times. If you can’t do that it will tax your developers with needlessly having to come up with reasons why they committed before lunch… that meeting… going to the toilet and so on.


That's just an interactive rebase with extra steps.


> That's kind of the point though: being reasonably sure that a commit contains a tree that the committer had seen at some point, instead of making up history with commits that contain trees that the committer never saw at any point at all.

I don't see how this follows. Merge-heavy histories in my experience tend to be far less bisectable. They have all sorts of "oops, fixup" nonsense going on, precisely because the author did not take the time to get things right the first time.

Any workflow that happens on a number of patches greater than 1 accepts poor bisectability as a risk. But the only real solution there is Giant Monolithic Commits, which we all agree is even worse, right?


Yeah if "merge-heavy" means "ship the reflog", I get what you mean.

But if "merge-heavy" means "use merges when it makes sense, use rebase when it makes sense", then you can get a nice history with `git log --first-parent` that groups related commits together, and also a nice history with `git log --cherry` that shows what the "always-rebase-never-merge" dogmatic people want.

If for this particular project it just so happens that merge doesn't make sense because of the specific needs of the project, then so be it, nothing wrong with that. Same with rebases.

Unfortunately this topic is another holy war where the ship-the-reflog dogma fights against the always-rebase-never-merge dogma.

No balance.

> I don't see how this follows. Merge-heavy histories in my experience tend to be far less bisectable. They have all sorts of "oops, fixup" nonsense going on, precisely because the author did not take the time to get things right the first time.

That sounds more like merge-only (a.k.a. "ship the reflog"). Doesn't have to be that way.

Evaluate trade-offs and choose based on that evaluation.

Does adding a new commit have any actual advantage (e.g. easily reverting one or the other) compared to just amending/squashing it, or is it just some developer's own subjective sense of purity?

Does re-ordering the commits have any actual advantage (e.g. change has a smaller context and can be more easily reverted that way) compared to just leaving those commits in that order, or is it just some developer's own subjective sense of aesthetics?

Does using merge commits bring any actual advantage (e.g. the project benefits from being able to bisect on PRs or features as a whole) compared to rebasing (not fast-forwarding), or is it just some developer's own subjective sense of purity?

Does rebasing bring any actual advantage (e.g. each commit is already atomic, fully self-contained, and well tested against the new base, so "grouping" them with a merge commit doesn't make sense) compared to doing a merge-commit, or is it just some developer's own subjective sense of aesthetics?

> Any workflow that happens on a number of patches greater than 1 accepts poor bisectability as a risk.

Poor bisectability or developers putting actual effort into ensuring commits are atomic and test them.

Bisectability is nice with good rebased commits. Bisectability is nice with good merge commits.

Bisectability is bad when developers don't care about keeping bisectability good.

> But the only real solution there is Giant Monolithic Commits, which we all agree is even worse, right?

It depends.

Those commits might not be easy to understand, but they sure as hell are easy to revert (more likely than not) if something goes wrong, because they tend to correspond almost 1:1 to GitHub issues (or Jira tickets, or whatever equivalent). Keyword "almost" because sometimes you can get 2 of those for the same issue/ticket/whatever.

But those 2 unproperly split commits (therefore huge) are still easier to revert compared to a spray of 10 unproperly rebased tiny commits where 9 of them are broken (because of what I mention in other comments where people only test HEAD).


Too much text. But what I will say is that the "good" merge workflow you posit really only exists in one place (Linux) and requires a feudal hierarchy of high value maintainers individually enforcing all the rules via personal virtuosity. I've never seen it scale to a "typical" project run by managers and processes.

Where the straightforward "get your stuff rebased into a linear tree" tends to work pretty well in practice. The rules are simpler and easier to audit and enforce.


> leaving broken commits useless for git bisect.

Honestly all of this is just a function of Git’s tooling being pretty bad.

There’s no reason that a merge based history can’t be presented as a linear history. That’s purely a matter of how the information is displayed!

Similarly there’s absolutely no reason that git bisect should try to operate on every single commit. We live in a world where CI systems need to land a queue commits at a time. No one can afford to run every test on every commit. Git should have support for tagging commits that had varying levels of tests and then running bisect against only those commits. Easy peasy.


I think most people these days just look at PRs. Everything else is largely noise.


Not at all. Someone’s got to look at your commits in the future when your code breaks ;)


Yep, this is why I'm mostly against squashing (and completely against blind squash-merges).


I'm not sure I get the advantage. The only thing I know is that the last commit on each PR is the one that is claimed to work. All others might as well be noise at that point since those random intermediates were never HEAD on the main branch, might be broken, incomplete, have failing tests, etc.. Squashing every PR into a single commit is at least an honest history of what's actually going out.

If you squash you have a history where every commit was tested and works (bugs notwithstanding) which to me is way more useful.


> (bugs notwithstanding)

This is the reason. I've been on a maintenance team for years where almost everything we handle was written by people no longer at the company, and often enough I've seen bugs get introduced during the original work, where the fix ends up being obvious because I can see the original commits and how the code got into its current state. A squash of any sort would've hidden the refactor and made it much more difficult.

My favorite are ones where "linting" and code formatter commits introduce bugs. Keep those separate from your actual work, please.


I mean you should be designing your commits such that each individual commit builds. That's the point of using squashes to fix up your history!


Commit refactoring can be really hard work however. Basically you do something like taking N random commits and convert these into M logical ones - where each one delivers incremental value and builds upon the other.

For some types of work it is easy, N=M: you were able to do high quality value adding atomic commits for the whole PR without rework.

For other types N >> M. This can happen when trying different approaches to a hard problem. I suppose research type work could always be considered a POC and the actual implementation could be a kind of cleanroom re-implementation of the POC but there isn't always time for such things and (again) the PR is far more important than the commits that built up to it - particularly if the resultant code is of equal quality. Note that I am not advocating for long running branches here - trunk based development is generally better provided it doesn't over incentivize teams to avoid hard problems (but that is a topic for another day).

This is why I think git should include the PR as a first class concept. For simple N=M type work, 1 PR should generally be 1 commit. Why not, after all, make the PR small and easy to review when you can? For harder N >> M type work, you get one PR with many commits that one can dig into if necessary.


Oh I'm for squashing to make the history make sense. Please do not blindly squash-merge though.


And messy intermediate states won't help him at all. It won't help him either when related commits are interwinned with unrelated commits in history rather then being together.


If you want the full history of someone's work, you need all the edits. Including all the times they backspaced over the typo. With down-to-the-millisecond timestamps attached!


Although git-rebase has a tool to support testing the rebased non-HEAD commits: --exec. Append --exec "build-incrementally && rerun-affected-tests" in whatever form is appropriate for your project and git will compile and test every non-HEAD commit for you.


When I rebase I diff after each rebase to check that the only diffs are the ones I intended. So I, the committer, have seen all my rebased commits.


This assumes the rebasing will be done in a shared branch. Two rules of rebasing:

1. Never rebase a shared branch 2. Never break rule 1


No, that describes rebasing and preserving the intermediate commits. Of course, if you squash into one commit at merge time, this won't happen.


> nobody would bother re-testing all those n-1 commits after the fix was added

I do.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: