Metric by which to hold developers accountable

Question

I asked a question on lines of code per hour and got torn a new one. So my matured follow-up question is this:

If not lines of code, then what is a good metric by which to measure (by the hour/day/unit-of-time) the effectiveness of remote programmers?

Jon Hopkins · Accepted Answer · 2012-03-13T11:15:09.517

In 16 years I've never actually found a workable metric of the sort you're looking for.

Essentially to be useful anything would need to be measurable, representative and ungameable (that is the system can't be played by clever developers). There are simply too many variables within software development to make it measurable as piece work in this way.

The closest you get is progress against estimates - that is how many tasks are they completing within the agreed estimates. The trick here is (a) getting good and fair estimates and (b) understanding where estimates have been exceeded for good reasons for which the developer can not / should not be blamed (that is something was genuinely more complex than anticipated). Ultimately if you push developers too hard you're likely to find estimates gradually creeping up to a level where they're always met not because of increased productivity but because of padded timescales.

Go the other way too much in terms of the estimates (reducing them to create pressure to deliver) and you create phoney deadlines which studies have shown don't increase productivity and are likely to have an impact on team morale (see Peopleware for more information).

But essentially I wonder if you're looking at a slightly false problem. Why are remote programmers different to other programmers when it comes to measuring productivity? How do you measure the productivity of non-remote programmers?

If it's about not trusting them to work remotely then that's basically a wider trust issue. If you don't trust them to work from home then you either need to establish that trust, not let them work from home, or find some way of satisfying yourself that they are indeed working when they're meant to be.

score 61 · Answer 2 · answered Dec 15 '10 at 19:06

Metrics work best in factories, and programmers don't work on an assembly line.

I completely understand the desire to measure productivity.

But would you use the same metric for a family doctor and a heart surgeon? How about for Michelangelo painting the Sistine Chapel, and some guy in Mexico cranking out black velvet Elvis paintings?

Louis de Broglie wrote a doctoral thesis that was so short, the examiners were going to reject it - except de Broglie was a highly-placed aristocrat, and they needed a good excuse. So the examiners sent it to Einstein, who not only didn't reject it, he referred it to the Nobel committee, and de Broglie got the Nobel Prize in Physics for it five years later.

Numerical measures work best on work that's repetitive, like casting iron or screwing bolts on car doors. But if you're repeating code that's been done before, you don't need a programmer, you just need a copy-and-paste. Programming is fundamentally a creative discipline, and productivity depends entirely on what you're doing.

Some days, I crank out 1000 lines of code. Today, I'm going to be fixing coordinate geometry bugs, and the code might shrink. If I had to fix a bug in a Linux kernel driver, I might spend all day on debugging, and not write a line of new code.

Measuring programmer productivity is very, very, very subjective.

If you want to know if Joe is productive, find Sally and Ralph, who know what Joe's doing and are proficient in the same areas, and ask them.

The best numerical system I've ever seen has been Agile's planning poker points. That's just a fancy way of asking Joe and Sally and Ralph how hard they think Joe's upcoming job is likely to be. Then you can measure points-per-week productivity for each team member. But even then, it takes a while to calibrate a team's estimations, and the numbers are fuzzy and easily thrown off.

Many people want productivity estimates so they can do schedule planning. It's kind of the "plug it into MS Project, look at the critical path, and there's your ship date" theory. I have never, ever seen that work - there are just too many unknowns. If you want that, use Waterfall, design everything up front, don't permit any change orders, and be prepared to be disappointed anyway.

score 40 · Answer 3 · 2010-12-15T09:58:46.753

The only metric I use is the amount of working software he produces for a given amount of money I invested.

Regardless its schedule, if she/he work remotely or not, number of pauses he/she takes, methodologies she/he uses or number of working hours.

By working software I mean:

List of features defined by user/customer that meets the required quality level

By amount of money:

How much the user/customer paid for the defined features + maintenance costs

So it has a direct on how it is built and quality of work produced but not bound to any source code line metrics.

user281377 · Answer 4 · 2010-12-15T08:47:23.043

25

You need an experienced developer or teamleader (who is not associated with those remote programmers) to estimate how long some task may take, and the effectiveness is measured by comparing their required time against the estimates. To be sure that the estimates are good, you could randomly pick a few tasks and have them executed by an in-house team you have under control.

edited Dec 15 '10 at 08:47

answered Dec 15 '10 at 07:55

user281377

28,434

score 8 · Answer 5 · answered Dec 15 '10 at 13:59

It certainly is possible to devise all kinds of intricate metrics to evaluate performance, but at the end of the day a significant part of your judgment has to rely on subjectivity and input from people who are close to the codebase.

For example, it is quite possible for some team to crank out internally hideous unmaintainable slop at a very fast rate, and this might even meet the required deadline and specification. But is the technical debt accrued from that kind of working style worse than if the team had churned out something robust and maintainable but slipped the deadline a few weeks? It depends.

If the purpose of the question is to resolve some type of productivity problem, I would say that what the manager actually does to facilitate the work of the team is as or more important than any measuring technique used to evaluate the team. It is a two-way street. In other words, I am saying that metrics are fine but if you want more out of any team the ultimate question is whether or not the manager doing everything possible to ensure the team can be productive.

This goes far beyond writing a spec, finding a team, throwing the spec "over the wall" and clicking a stopwatch.

score 7 · Answer 6 · answered Dec 17 '10 at 15:33

Another interesting way to measure productivity would be to count automatable tests reviewed by a manager per day. The developer gets:

a point for writing an automatable test (and passing review) and adding it to the regression test suite,
a point for making them pass, while not causing any other regression test failures.

The developer and manager can jointly improve the system by:

jointly agreeing on the important areas of development and testing
independently reviewing and running the test suite.
deciding not to build a feature that has limited business benefit but requires lots of development and testing required to deliver that feature. (The most productive line of code is one that you decided not to write because it delivers no business benefit).
partitioning the system into an architecture (such as model-view-controller) which facilitates incremental feature development without breaking the whole system.

The developer cannot game the metric because:

redundant tests will be blocked by manager review.
fine-grained tests may be blocked by manager review.
fine-grained tests will improve the quality of the system.

The manager cannot game the metric because:

rejecting too many tests will lead to developer attrition.
requesting too many tests will make it hard to refuse a later deadline.

The developer cannot screw the manager because

Each delivered unit of functionality with tests must also pass the regression suite. i.e. This forces the developer to develop carefully without breaking other code.
Any claim of work must be provable by passing new tests and regression tests.

The manager cannot screw the developer because.

Each requested unit of functionality must include key test cases, and an estimate of the number of test cases needed to finish the work.
It's hard to ask for an aggressive schedule and/or free overtime when you are obviously asking for a lot of work.

Another big benefit to the manager is that he can bring on new developers and know that they will not be able to deliver code that silently breaks the system (because the regression test suite catches that).

The big downside to the manager is that it forces him to admit that his system is more complex than it seems on the 1-page description of the feature. The other downside is that the transparency of this method will make it hard to blame developers for business failure.

score 2 · Answer 7 · answered Dec 15 '10 at 09:28

Measure in the same way as you are measured by the customer. In terms of functional code, but on a smaller scale.

Make short goals - a week or two - and see if the programmers fulfill the goals, and do so in a satisfactory way.

I would strongly suggest peer review of code, as this allows you to catch bad code up front.

score 2 · Answer 8 · answered Dec 15 '10 at 22:49

How about Rate of sales of the product/service.

Sometimes I've heard it's called a commission/percentage of the gross

People buy good products, don't they ?

The business wants to sell the product ( or maybe service, same difference for this)

So if that is what you want, measure it.

A bit like saying people who design a car that gets good reviews & sells well have done a good job really.

Now adopt this metric and programming team will want to interact with sales guy for two reasons.

Promsing underliverable
Not selling product to customers effectively

score 1 · Answer 9 · answered Dec 15 '10 at 08:50

Some ideas:

features implemented
bugs fixed (also account for bugs later re-opened by QA)
user complaints resolved (note that it's not the same as 2 - one serious bug may be pain in the neck while 100 typos may not be that important)

Also may want to track:

Code coverage by tests
Code coverage by internal documentation
Feature coverage by external (user) documentation

score 0 · Answer 10 · answered Dec 15 '10 at 22:01

Writing code/Programming is not like putting a hammer to a nail. Much like "writing" in general, it's not something that you can apply typically metrics too - in my opinion.

Couldn't you simply look at their check-ins, or what they've done through peer-review, code-review?

Or you know, if they actually produce working code and solutions that fix problems?

score -1 · Answer 11 · answered Dec 15 '10 at 10:42

Use a methodology, whereby the written documentation marries up to the code written. Start the week deciding on what needs to be done, get an agreement, then wait till the end of the week to see if its been done or not. Keep tasks small and measurable as in how many days. I don't think you need to measure the programmers work per say, but the agreement on what is to be delivered and when is a must for control.

Second part to this solution would be peer-to-peer code reviews which is backed up by some sort of Versioning system that makes who did what and when traceable. If the consensus is that the code is good then your onto a winner, if bad, then find out why and how it could be improved.

Time and motion studies are a no no as far as I am concerned, some code such as Regexes or some really hard logic can take days to develop but may only form a couple of lines of code. The only true measurement is deliverables on time, on an agreed time.

Metric by which to hold developers accountable

11 Answers11

Linked

Related