73

Does anyone know if there is some kind of tool to put a number on technical debt of a code base, as a kind of code metric? If not, is anyone aware of an algorithm or set of heuristics for it?

If neither of those things exists so far, I'd be interested in ideas for how to get started with such a thing. That is, how can I quantify the technical debt incurred by a method, a class, a namespace, an assembly, etc.

I'm most interested in analyzing and assessing a C# code base, but please feel free to chime in for other languages as well, particularly if the concepts are language transcendent.

Thomas Owens
  • 85,641
  • 18
  • 207
  • 307

12 Answers12

41

Technical debt is just an abstract idea that, somewhere along the lines of designing, building, testing, and maintaining a system, certain decisions were made such that the product has become more difficult to test and maintain. Having more technical debt means that it will become more difficult to continue to develop a system - you either need to cope with the technical debt and allocate more and more time for what would otherwise be simple tasks, or you need to invest resources (time and money) into reducing technical debt by refactoring the code, improving the tests, and so on.

There are a number of metrics that might give you some indication as to the quality of the code:

  • Code coverage. There are various tools that tell you what percentage of your functions, statements, and lines are covered by unit tests. You can also map system and acceptance tests back to requirements to determine the percentage of requirements covered by a system-level test. The appropriate coverage depends on the nature of the application.
  • Coupling and cohesion. Code that exhibits low coupling and high cohesion is typically easier to read, understand, and test. There are code analysis tools that can report the amount of coupling and cohesion in a given system.
  • Cyclomatic complexity is the number of unique paths through an application. It's typically counted at the method/function level. Cyclomatic complexity is related to the understandability and testability of a module. Not only do higher cyclomatic complexity values indicate that someone will have more trouble following the code, but the cyclomatic complexity also indicates the number of test cases required to achieve coverage.
  • The various Halstead complexity measures provide insight into the readability of the code. These count the operators and operands to determine volume, difficulty, and effort. Often, these can indicate how difficult it will be for someone to pick up the code and understand it, often in instances such as a code review or a new developer to the code base.
  • Amount of duplicate code. Duplicated code can indicate potential for refactoring to methods. Having duplicate code means that there are more lines for a bug to be introduced, and a higher likelihood that the same defects exist in multiple places. If the same business logic exists in multiple places, it becomes harder to update the system to account for changes.

Often, static analysis tools will be able to alert you of potential problems. Of course, just because a tool indicates a problem doesn't mean there is a problem - it takes human judgement to determine if something could be problematic down the road. These metrics just give you warnings that it might be time to look at a system or module more closely.

However, these attributes focus on the code. They don't readily indicate any technical debt in your system architecture or design that might relate to various quality attributes.

Thomas Owens
  • 85,641
  • 18
  • 207
  • 307
25

Sonar has a technical debt heuristic as well as several other features useful to a software project.

It also supports a pretty wide range of languages.

SonarQube (formerly Sonar) is an open source platform for Continuous Inspection of code quality...

  • Support 25+ languages: Java, C/C++, C#, PHP, Flex, Groovy, JavaScript, Python, PL/SQL, COBOL, etc.
  • SonarQube is also used in Android Deveopment.
  • Offers reports on duplicated code, coding standards, unit tests, code coverage, complex code, potential bugs, comments and design and architecture.
  • Time machine and differential views.
  • Fully automated analyses: integrates with Maven, Ant, Gradle and continuous integration tools (Atlassian Bamboo, Jenkins, Hudson, etc.).
  • Integrates with the Eclipse development environment
  • Integrates with external tools: JIRA, Mantis, LDAP, Fortify, etc.
  • Expandable with the use of plugins.
  • Implements SQALE methodology to compute technical debt...
gnat
  • 20,543
  • 29
  • 115
  • 306
5

I think the question is how much would it cost to "buy back" your technical debt--that is, how much work is it to fix it? Well, it's up to the team to figure that out.

During sprint planning, I ask the team to estimate the complexity of fixing technical debt items in the same way they would estimate the complexity of a user story. At that point, it's a negotiating game between the team an the product owner to determine which technical debt is high enough priority to be done in the current sprint (displacing actual user stories) and what can wait.

If you're not doing scrum, I'd stick to my premise--technical debt should be measured by the cost of the remedy.

Matthew Flynn
  • 13,495
  • 2
  • 39
  • 58
5

I hate to use an analogy from finance but it seems really appropriate. When you're pricing something (assets of any kind), it can have both intrinsic and extrinsic value. In this case, the existing code has intrinsic value which would be a quantity corresponding to the relative quality of said code and it would also have extrinsic value (value from what could be done to the code) and those quantities would be additive. The intrinsic value can be broken down into credits and debits (good vs. bad) using whatever methodology you're using to score the code (+5 for comments/readability, -10 for code coverage, etc.)

I certainly don't know of any tools that quantify this today and I think you'd have an entirely new discussion on your hands if you argue the merits of different "debt valuation" strategies but I agree with Matthew -- the debt is the cumulative cost of getting the code as good as you can possibly get it, using whatever method you use to cost out the man-hours it takes to get there.

Something else to consider is that there is certainly a measure of cost-effectiveness whereby as one gets closer to "perfection", the value of an hour spent on the code base is more than likely decreasing exponentially so there is probably an additional optimization problem to maximize utility of the work done.

5

In between developers a fairly reliable measure of technical debt seem to be WTFs/minute.

Issue with this "metric" is that it is typically rather difficult to communicate "outside".

Metric that worked for me in communicating technical debt to "outsiders" was amount of testing and bug fixing effort (especially for fixing regression bugs) needed for successful delivery.

A word of caution: although this approach is quite powerful, one would better double-check with good old WTFs/minute before resorting to it. Thing is, it is quite cumbersome: to get the data, one has to carefully track time and accurately log it per appropriate categories.

  • it is so much easier to state 3 weeks total spent on implementing feature A than
     
    I spent 14 hours on draft implementation of feature A then 29 hours on smoke testing it then 11 hours on implementing fixes for regressions I discovered, then 18 hours testing the QA-ready feature implementation. After that, QA guys spent 17 hours on testing the initial candidate release. After that I spent 13 hours analyzing bugs submitted by QA for the initial candidate release and 3 hours implementing the fixes. After that, I spent 11 hours on smoke testing the changes I made to initial candidate release. After that...

Anyway data about testing and bug fixing effort has been quite easy to communicate in my experience.

For recent release, we spent about 90% time on testing and fixing regression bugs. For next release, suggest to allocate some effort on getting this value down to 60-70%.


Another word of caution. Data like 90% above could be interpreted not only as an indication of technical debt, but also (surprise surprise) as indication of one being not quite proficient in programming / particular technology. "You just make too much bugs in your code".

If there is a risk of data being misinterpreted that way, it helps to have an additional, reference data on something less WTF prone to compare against.

  • Say if there are two similar components / applications maintained by same developer(s), first releasing at "waste rate" about 50% and second at 80-90, this makes a pretty strong case in favor of second being subject of technical debt.

If there are dedicated testers in the project, they could also contribute to more objective evaluation of the data. As I mentioned in another answer,

With testers, you get someone to backup your understanding of design issues. When there are only developers complaining about code quality, this often sounds like subjective WTFs from behind the closed door.
 
But when this is echoed by QA guy saying something like component A had 100 regression bugs for 10 new features, as opposed to component B which had 10 regression bugs per 20 new features, communication suddenly turns into whole another game.

gnat
  • 20,543
  • 29
  • 115
  • 306
3

There's a pretty strong platform out there called CAST to look for technical debt in big applications. We used it on a project where we took over a big enhancement to a legacy system. It doesn't tell you what was in people's heads who wrote the code, but it examines code and finds code and architecture flaws, then quantifies to technical debt if you want to. The real use in looking at this, though, is not the $ amount but the list of problems already in the code. This tells you about a portion of the technical debt you have (so I do disagree with some of the answers up above). There is some technical debt that's purely design-based and that's very subjective - like pornography - you know it when you see it and know the context. I would argue whether that's really "technical" debt. There is some technical debt that's purely in the implementation and I believe that's worth measuring and tracking.

2

Here is a Webinar out of MIT describing research on technical debt in large software systems: http://sdm.mit.edu/news/news_articles/webinar_050613/sturtevant-webinar-technical-debt.html

The authors wrote code to analyze a project and pull out 'architectural complexity' metrics. These metrics were shown to have a strong relationship with defect density, developer productivity, and development staff turnover.

The work described in the Webinar builds on modularity research done by Alan MacCormack and Carliss Baldwin at Harvard Business School. I would look at their papers as well. Their 'propagation cost' might be what you are looking for.

1

I'd say the standard code metrics can be used as a high-level relative view of technical indebtedness. VS Ultimate includes a Code Analyzer that will give you a "Maintainability Index" based on Cyclomatic Complexity, Coupling, LoC, and Depth of Inheritance. You can dive down into any trouble spots and see details (down to the function level). I just ran it on my project and the lowest scores we got were 69 on our Data package (configuring and initializing EF) and our Test Suite. Everything else was 90 or above. There are other tools that will give you more metrics like those discussed in Uncle Bob's PPP

Michael Brown
  • 21,822
1

I work for a company that is looking into this exactly. Below are 3 actionable metrics that we recommend to look at when tackling technical debt. For more information on "how" and "when" to track them, we put together a summary article 3 Metrics to Understand and Tackle Technical Debt.

What are your thoughts? Happy to answer any questions and hungry to hear your feedback :).

Ownership to prevent defects & unwanted tech debt

Ownership is a leading indicator of engineering health.

The parts of the codebase receiving contributions from many people accumulate cruft over time, while those receiving contributions from fewer people tend to be in a better state. It's easier to maintain high standards in a tight group that is well-informed about their part of the codebase.

This provides some predictive power: weakly owned parts of the codebase are likely to accumulate debt over time and become increasingly hard to work with. In particular, it's likely for debt to be unintentionally taken on, simply as a side-effect of incomplete information and diluted ownership of the code's quality.

This is somewhat analogous to the tragedy of the commons.

Cohesion to improve the architecture

Cohesion is a trailing indicator of well defined components.

Cohesion and its counterpart, coupling, have long been recognised as important concepts to focus on when designing software.

Code is said to have high cohesion when most of its elements belong together. High cohesion is generally preferrable because it's associated with maintainability, reusability, and robustness. High cohesion and loose coupling tend to go hand in hand.

Beyond being associated with more reusable and maintainable code, high cohesion also minimises the number of people who need to be involved to modify a given part of the codebase which increases productivity.

Churn to identify problem areas

Churn (repeated activity) helps identify and rank areas ripe for refactoring in a growing system.

As systems grow, it becomes harder for developers to understand their architecture. If developers have to modify many parts of the codebase to deliver a new feature, it will be difficult for them to avoid introducing side-effects leading to bugs, and they will be less productive because they need to familiarise themselves with more elements and concepts.

This is why it's important to strive for single responsibility to create a more stable system and avoid unintended consequences. While some files are architectural hubs and remain active as new features are added, it's a good idea to write code in a way that brings closure to files, and rigorously review, test, and QA churning areas.

Churn surfaces these active files so you can decide whether they should be broken down to reduce the surface area of change in your codebase.

0

I wouldn't think of technical debt as dollars where you need a fancy model to quantify it. I would think of it as favors. If someone does you a favor and you are likely to forget, you write it down. When you take a short cut, write it down. This helps you remember, and more impotent forces you to acknowledge it. No fancy tool is needed. Notepad or Ecxel can do the trick.

MathAttack
  • 2,786
0

Quantifying technical debt still has a lot of challenges.

If we go with the original use of the term, where Ward Cunningham used it to illustrate how incremental development can be a better approach than waterfall, we could say that every project that works with real-life changing requirements has technical debt, while all waterfall projects should be free of them, but this is not a very precise quantification.

On the other hand, when using tools to get more precise numbers, research shows that we have to keep in mind the limitations of these tools, before making any decision based on their reports:

  • Different tools might use different definitions and report findings that are hard to statistically correlate.
  • The tools are not really accurate, sometimes overestimating fixing costs by 20 times.
  • The classification of found issues was also not very accurate, most of the issues classified as bugs are in practice not likely to lead to faults.
  • As the static analyzers are also software products that evolve, it could happen that one version reports much more findings than the previous (after adding new detectors), or much fewer (after making the detectors more precise). So different versions of the exact same tool can report different numbers on the exact same source code (in some cases this could also depend on whether the free or the paying variant of the tool is used).
  • In our research, we found that this situation leads to an issue where a large part of the contemporary research published at the most prestigious venue for Technical Debt research, might not be reproducible. Which might not be the best foundation to build further research, tools and finally make business decisions based on them. https://www.researchgate.net/publication/357875475_Reproducibility_in_the_technical_debt_domain

There is still a lot more work needed.

-1

If you have a good history via a bugtracker or some sort of agile software you can keep it simple. Time spent completing basic tasks. Also, reliability of estimates when the project was young vs. now.

Erik Reppen
  • 6,281