9

I write code in R, and often find myself attempting to optimize the code for better performance. In a given script that tackles a specific problem, I test different code alternatives and compare them to each other with benchmarking. At the end, I select the most performant method. However, I don't know how to document those benchmark tests.

I'll use an example to demonstrate (based on a real problem I asked about). In R, I want to write code that nests a dataframe by group. I have three possible methods I compare:

bench::mark(dplyr = mpg %>% group_by(manufacturer, year) %>% summarise(nest_cty = list(cty)),
            data.table = {MPG <- data.table(mpg); MPG[, .(nest_cty = list(cty)), by = list(manufacturer, year)] },
            collapse = mpg %>% fgroup_by(manufacturer, year) %>% fsummarise(nest_cty = list(cty)),
            check = FALSE)
#> # A tibble: 3 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 dplyr        4.69ms    5.5ms      184.    2.38MB     5.56
#> 2 data.table   2.37ms   2.51ms      391.    2.16MB     0   
#> 3 collapse     95.2us  101.8us     9560.  206.56KB     6.22

The benchmarking table reveals that the third option is the most performant. So when finalizing my script, I'll choose that method. But I still want to document that I've tested different methods, and the results of the benchmarking, so future-me or collaborators could understand my choices.

How should I document this? I understand that writing such "story" in comments, inside the script, is considered a bad practice. Other option is via git commits. However, I find it too verbose to include such explanation in the description of a git commit. Furthermore, git commits have to do with tracking changes in code, but my need here is more of a metainformation about general strategy rather than specific change in code.

Christophe
  • 81,699
Emman
  • 209

4 Answers4

18

It is not bad practice to put this as a comment in code. Your goal is not to show everyone you did your homework, you want to prevent any successor to go "this is silly, I can do this more elegantly" => typer-de-type, fixed!. The only place to make this work is in the code. You can be brief, no tables with test results, just "it turned out this was about x times faster than that so I went with this solution instead of that one". Putting this in a design document is pointless, nobody will read that at the time it matters.

Martin Maat
  • 18,652
6

This is the kind of information that you could document in some design document. The purpose is to keep track of important choices that were made and why.

Choices with alternatives that were seriously considered would be good candidates in such documents, especially if it's a key element in your algorithm or a general strategy/technique/pattern that you may use in several places (as your real problem suggest). This avoids loosing time reassessing over and over again the same questions.

Comments in the code should remain concise and sharp and should not distract with lengthy justification and historical reasons.

Edit: Strategies/techniques/patterns that are too specific for the general design, but worth to be known and reusable, could be explained in a separate design pattern document with all the justifications needed. As suggested by @davidbak in the comments, you may refer to this document in a concise comment, having the advantages of both approaches.

Christophe
  • 81,699
2

I would suggest a different approach. You did work benchmarking collecting numbers from which you made a decision.

If you could put that in with the rest of your code in a form that can be rerun later at will, it will allow others to see what you did, considered and the conclusion. It will also allow others to rerun the script at a later time where things might have changed to see if one of the other approaches is now more viable, instead of just having future maintainers blindly accepting an ancient conclusion (also known as https://en.wikipedia.org/wiki/Cargo_cult_programming)

1

Git is a fine place to store this alternative functionality

Either in a feature branch or as a commit in the history

Most code quality tools discourage commented out code, comments should explain code not be code itself

Some answers here encourage a bad practice, going against industry wide and heavily scrutinized code quality tools, which makes me wonder if their rationale is extremely niche or even valid in thier own niche (had they even challenged thier opinion?)

Git itself is a favourite of mine for these exact use cases, and i also quite like the idea of adding a design document as markdown in the corresponding repo or as a pull request write up

Stof
  • 141