25

Code formatting matters. Even indentation matters. And consistency is more important than minor improvements. But projects usually don't have a clear, complete, verifiable and enforced style guide from day 1, and major improvements may arrive any day. Maybe you find that

SELECT id, name, address
FROM persons JOIN addresses ON persons.id = addresses.person_id;

could be better written as / is better written than

SELECT persons.id,
       persons.name,
       addresses.address
  FROM persons
  JOIN addresses ON persons.id = addresses.person_id;

while working on adding more columns to the query. Maybe this is the most complex of all four queries in your code, or a trivial query among thousands. No matter how difficult the transition, you decide it's worth it. But how do you track code changes across major formatting changes? You could just give up and say "this is the point where we start again", or you could reformat all queries in the entire repository history.

If you're using a distributed version control system like Git you can revert to the first commit ever, and reformat your way from there to the current state. But it's a lot of work, and everyone else would have to pause work (or be prepared for the mother of all merges) while it's going on. Is there a better way to change history which gives the best of all results:

  • Same style in all commits
  • Minimal merge work

?

To clarify, this is not about best practices when starting the project, but rather what should be done when a large refactoring has been deemed a Good Thing™ but you still want a traceable history? Never rewriting history is great if it's the only way to ensure that your versions always work the same, but what about the developer benefits of a clean rewrite? Especially if you have ways (tests, syntax definitions or an identical binary after compilation) to ensure that the rewritten version works exactly the same way as the original?

l0b0
  • 11,547

6 Answers6

30

Do the reformatting as separate commits. This will interfere minimally with the history, and you should be able to see at a glance which commits are just reformatting and which actually change code. It could skew git blame and similar, but if it points to a reformat-only commit, it's fairly straight forward to look for the previous change before that.

harald
  • 1,953
14

Don't rewrite VCS history: it;s against VCS principles.

Don't try to automate fixing the formatting: it is treating the symptoms, not the real problem (= developers not following coding standards).

Define the coding standard and formatting best practices in a common document and get all developers to agree.

You mention Git, which is great, because it's distributed. With a DVCS it's very easy to enforce best practices through the gatekeeper workflow. Gatekeepers reject merge proposals (= pull requests in Git) that don't conform to the common guidelines. And I do mean reject, in bold letters, otherwise the coder in violation will not bother to follow the rules and continue to repeat the same mistakes.

This technique works well for me. Coders want their work to be merged, so after a few mistakes in the beginning they start following the rules.

As per fixing the existing code base... I recommend doing that gradually, perhaps module by module, or as it makes sense for your project. Test carefully at each step. It may sound stupid, but mistakes do happen even with trivial changes like just the formatting, so be prepared for some minor bumps on the road.

janos
  • 1,829
10

The answer to your actual question is, "You don't." I know of no current SCM tool that can trace changes in logic from code formatted in one way, through a major formatting change, and through further changes after the code is formatted in the new way. And, you know this, losing the history on a piece of code is Not Good.

Accordingly, I'm going to contradict your first sentence a bit. Code formatting doesn't matter that much. Pretty is nice, but it's not what we're here for. I understand as well as anybody that getting dumped into somebody's old hellish weird K&R variant code with the two-space indents sucks (1), but... the formatting isn't actually an obstacle to understanding what's going on, unless it's something exceptionally pathological. And in that case, you're going to have problems changing the code anyway, and shouldn't bother it.

Therefore, it's not worth it to make changes to established code STRICTLY to reformat it. Changing the variable names, breaking up long functions, all that good refactoring stuff that changes the content, yes, but not JUST reformatting.

1) - I once owned the Windows Clipboard Viewer for a while. The whole thing was one, 150k, C module. I found a spot where different people had used, I think, five different brace styles within thirty lines of each other. But that section of things WORKED. I carried around a printout of that chunk of code for ten years, but I didn't poke it because that history mattered, and that code was in at least three source trees (Windows 3.x, NT, future 95) which all lived in different buildings.

mjfgates
  • 2,064
  • 2
  • 13
  • 15
4

But how do you track code changes across major formatting changes?

Formatting changes are code changes; treat them as you would any other change to your code. Anyone who has worked on a significant project will probably have seen bugs and other issues that were created when someone decided to "just" reformat some code.

But it's a lot of work, and everyone else would have to pause work (or be prepared for the mother of all merges) while it's going on.

Why do you have to reformat everything all at the same time? Especially if the reformatting doesn't change the meaning of the code, you should be able to reformat files individually and check them in as you go along. Better, get everyone on your team to agree on a style (otherwise there's no point in reformatting anyway) and have them all take care of reformatting in the course of their other work. After a while, you'll have covered most of the code with out disrupting the rest of the project.

Caleb
  • 39,298
1

There are viable two approaches that I've seen for this.

1. Reformat code on commit-hook

While it's initially hair-raising to alter code after they've submitted it, if your reformatting procedure (e.g. astyle) doesn't hurt the code, then its a safe operation. Over time the whole team will appreciate that all code eventually looks the same. Clearly, having comprehensive unit/automated tests will ensure that nothing broke.

2. One-time reformatting of all the code

This is more dangerous in my experience, and makes tracking issues across the big-bang difficult, but it's possible. Running all tests afterwards is essential. For coding style, the majority of differences revolve around use of whitespace - indentation or newlines. A decent merge tool should be able to be told to ignore all whitespace differences, so this will help with merges.

JBRWilkinson
  • 6,769
1

Since the question was originally asked, a much better way of handling the same formatting across the whole team has been introduced, namely an .editorconfig file in the root of your source tree.

This file is either read directly by most modern IDE's or can easily be made to do so and tells exactly how the source should be treated. Tabs versus spaces is a thing of the past.

For new projects this should be added in one of the very first commits.

I would suggest for an existing project that the whole team converges the work to a single branch, and the .editorconfig file is introduced with a reformat of all the sources on that branch in a single commit. Then normal work commences.

See https://editorconfig.org/ for details.