5

It took me time T to develop program A, which is measured at 1000 lines of code (SLOC), in certain a language and domain and of a certain complexity. Is there a method to determine how much time it will take to develop program B, which is estimated to be 4000 lines, that has the same level of complexity, is in the same domain, and is developed in the same programming language?

I expect that the time it takes me will be grater than 4T. Is there a formula to estimate how T grows as SLOC count grows?

Thomas Owens
  • 85,641
  • 18
  • 207
  • 307
Andrei
  • 153

8 Answers8

15

Applications can't be quantified in terms of LOC - it just doesn't work. Ever. So please, save yourself the hassle and don't do it.

Edit: Unless this is some sort of homework question... in which case the professor is a twit and you should go to a better school - n^2

6

People have developed a number of models to try to estimate things like this. While I wouldn't try to claim that any of them is anywhere close to entirely reliable or accurate, there are a few that seem to take enough factors into account to give halfway reasonable estimates.

Just for one example, Barry Boehm's COCOMO II model seems to fit your situation reasonably well. According to one online implementation, your original 1 KLOC should have taken around 4 person months of effort, and your 4 KLOC should take around 10 person months (for one set of assumptions -- feel free to plug in more appropriate values for the type of development and such).

At the same time, I'd have to agree with others who've pointed out that lines of code is rarely a very good measure (of much of anything). Estimation based on function points (for one possibility) seems rather more accurate to me. Even at best, however, it will take substantially more work, and it may be open to question whether it produces results enough more accurate or reliable to justify that work, especially for a fairly small project like this.

Edit: Oops -- I pasted in the wrong link (that was for the original COCOMO model, not COCOMO II). COCOMO II is a bit more work to use (it might take a minute or two instead of 30 seconds), but produces (what are supposed to be) more accurate results. Online implementations are available. It definitely attempts to take more factors into account in any case (e.g., whether you can re-use any/all of the existing 1000 lines of code in the new project).

Jerry Coffin
  • 44,795
3

This is a bit controversial, but for project management SLOC is typically used for determining what the estimated timelines (i.e. read Software Estimation: Demystifying the Black Art (Best Practices (Microsoft))); however, what is usually underlined time and time again is that you need a large enough data set of similar problems you can start to notice trends in how fast it takes to develop things. Note that this also generally applies to very large code bases as well and you don't start to see accurate estimates until you are in the 100,000+ SLOC.

To build on MainMa's diving analogy a bit, if you are driving in a major city and all of the trips are less than 50 km you might eventually be able to say with a degree of confidence that the trip will take about 30 minutes under normal traffic conditions but the range of an individual trip might take between 15 minutes and two hours for any given instance.

This is similar to trying to estimate how long it will take to write a given function or story point since not all are the same. Resolving a story point that only involves getting some data and converting it to a report might only take a couple of hours for someone familiar with the project where as trying to improve upon some underlying queuing code your program is using might take several days. This is generally where evidenced-based scheduling is better as the developer is the one driving the estimate based upon their experience with the given task and then you adjust things based upon the historical evidence that relates to the developer which is why this technique tends to be better for task estimation.

Going back to the SLOC's as noted before, they can be used for estimating when a major project will be completed but only at the large scale and then don't scale down very well and require historical evidence of similar projects under similar conditions to generate the time-line estimate and they are really only used as guidance at the end of the day. Going back to the diving analogy. This is similar to long haul road trips (i.e. starting at 1,500 km) since the sheer amount of distance ensure that even though you might run into parts of the trip where you are crawling through traffic, you will also encounter times where you can go the speed limit for an extended period of time. This means that after you have done the trip a couple of times you can give a pretty reasonable estimate as to how fast you were averaging during the trip and how long it will take to get from point A to point B. Large projects are the same way: the sheer size of the project allow for project planners to be able to say that, "We have done a project of similar scope before in the past, it will likely be as big as those projects so the time to complete it will likely be similar to them."

rjzii
  • 11,304
1

If you want your code to have less bugs, you should write a lot of automated tests, and do it before and while you write the code, and not after a component is ready. There are testing frameworks for different languages and platforms. You can read about Test Driven Development, there are a lot of online and offline resources on the subject.

escargot agile
  • 234
  • 1
  • 6
1

Time (T) required for development (of a program) is not only function of lines of code (SLOC). It's also function of quality (Q) (and probably n+1 more variables).

If Q is low, then T grows somewhat linearly with SLOC. (You just bang more lines of code, and it's more or less a physical activity).

When Q gets higher T starts to grow exponentially and gets ever closer to infinity. (It's very hard to write a totally bug free code of more than three SLOC).

So, I think, it's almost impossible to estimate T if only given SLOC. Maybe, if you are lucky you might hit in the range of +-1 order of magnitudes. Eg. you estimate 10 days, and it might take something between 1 and 100 days.

Maglob
  • 3,849
1

4K lines of simple code may take you 1/10th the time to write as 1K lines of complex code. And 4K lines of complex code may take you 40 times longer to write than 1K lines of simple code. The measure is meaningless.

Matthew Read
  • 2,021
0
if (x > 0)
{
    i = 1;
}
else 
{
    i = 2;
}

vs

i = (x > 0 ? 1 : 2);

7 lines plus one empty line vs. one line.

It all depends on how you write the code. Whether you can write simple code for simple problems or whether a developer has to do things in the most complicated way possible.

If your code does four times more, then you may reach a point where you can’t get away with badly designed and unmaintainable code anymore and take five times longer instead of four times, but with higher code quality.

gnasher729
  • 49,096
-1

You can't simply look at LOC/SLOC by itself the way you are trying to. The only way you can use LOCs with some degree of success (and as a guideline, not as an infallible rule) from previous projects to estimate future project sizes is by having a decent number of projects with their SLOC, number of resources (developers) and time of completion accounted for. Then you can use that to extrapolate.

But to take just one project, one single project, specially one that is not that big (1K is fairly small), that's just too little data to use LOC metrics in any meaningful manner.

If this is a homework, your professor is a clueless dick btw.

However, if this is for real, and if you are really that pressed, you could use the following guidelines:

EXPECTED_COMPLETION_TIME = 
  ( PREVIOUS_COMPLETION_TIME / PREVIOUS_KLOC ) * EXPECTED_KLOC * SPILLOFF

With SPILLOFF = 1 giving you a 30% chance of success (a 70% chance of failure), SPILLOFF = 1.5 giving you a 60% probability of success (a 40% chance of failure) and SPILLOFF = 2 giving you a 90% chance of success (a 10$ chance of failure.) The reason for using such estimates is that failures in completing software projects tend to exhibit an exponential distribution with respect to the allocated time per time allocate (or whatever other resource you choose to use.)


When you have consistent work within an organizations or when you work in similar environments, and technology (not just the language, but the technology) as well as processes are uniform, then you can do some estimations with some margins of errors based on prior projects. In such cases, you want to give more weight to the most recent projects.

Say, for the last n + 1 projects (say n + 1 = 5 or 10... notice, it's n + 1, not n), you could do the following but only if you carefully keep track of the number of people involved in a project, actual number of LOC, actual completion time, and estimated completion time as estimated prior to the start of the project.

SUM = 0
FOR i = 1 to n
  KLOC_PER_HEAD(i) = KLOC(i) / TEAM_SIZE(i)
  ACTUAL_COST(i) = ACTUAL_COMPLETION_TIME(i) / KLOC_PER_HEAD(i)
  RUNOFF(i) = ACTUAL_COMPLETION_TIME(i) / ORIGINALLY_ESTIMATED_TIME(i)

  SUM = ( ACTUAL_COST(i) * RUNNOF(i) ) + SUM
END 

LAST_COST = ACTUAL_COST(n+1) = ACTUAL_COMPLETION_TIME(n+1) / KLOC_PER_HEAD(n+1)
LAST_RUNNOF = ACTUAL_COMPLETION_TIME(n+1) / ORIGINALLY_ESTIMATED_TIME(n+1)
LAST = LAST_COST * LAST_RUNNOF

ESTIMATE = ( ( (SUM / (n + 1) ) + LAST ) / 2 ) * SPILLOFF

With SPILLOF as defined previously.

luis.espinal
  • 2,620