How to estimate SLOC for project that has no source code yet?

Question

Recently, I've been getting familiar with COCOMO II model to calculate the effort (person-months) and duration (calendar time in months) required to develop a mobile application. To calculate the effort, the following formula should be used according to COCOMO II Model Definition Manual.

PM = A * Size^E * (Product of EMs)

In the equation, Size should be expressed in KSLOC (thousands of source lines of code). How am I supposed to know KSLOC of my mobile application when no single line of code has been written yet? Do people usually just make an educated guess or do they, for example, use similar projects as a basis for their estimation? Or are there some specific methods for estimating the SLOC in situations when you don't have any code yet?

score 5 · Accepted Answer · answered Sep 28 '16 at 17:59

The only way to get the SLOC input to COCOMO is to estimate that. There are different estimation approaches that you can use to try to come up with an estimate for the size of the software application under design. You can consider decomposition and recomposition, estimation by analogy, proxy-based estimates, and expert judgement in groups to estimate the size to use as an input into COCOMO.

The method you choose is likely to be determined by your environment. For example, if you don't have a group of knowledgable people, you really can't use expert judgement in groups. If you do, you can use any number of techniques (including the other ones that I mention) as an individual contribution to the group estimating effort.

If you have good historical projects, then you can use estimation by analogy or proxy-based estimates to compare with previous projects or components. If you have a previously made component that is similar in size and scope, then you can count that. However, if you are using a new programming language or don't have a good history of past estimates, then the best method would be to keep decomposing the work into smaller, easily estimated pieces, estimate those, and then roll the total back up.

Something else to consider would be to use function points in COCOMO II instead of SLOC. Given a set of requirements, it may be easier to count the number of function points rather than attempt to estimate the number of SLOC.

score 4 · Answer 2 · answered Sep 28 '16 at 18:08

First, you should probably be aware that this has been cited as a problem with COCOMO for some time. A fair number of people think that alternatives, such as function point analysis, make more sense (and some at least claim that experience bears out its superiority).

That said, it doesn't seem to me that you've really given separate alternatives. On one hand you note: "educated guess." On the other, you point to "use similar projects as a basis for their estimation."

I think these are really the same thing. Looking at similar projects is basically just a method of educating (or maybe reeducating) yourself on project sizes. Realistically, for most people this is a nearly necessary step though. At least in my experience, most contributors on most projects don't really know how many lines of code ended up in that project without doing some looking (and given things like code reuse, that usually involves quite a bit more than running wc on the project's directory tree(s) too).

My own take: lines of code does more to distract than help. Looking at a previous project and saying: "that took about six months, and this is around twice the size, so it'll probably take around a year" tends to be at least as accurate with a lot less work. If you want to add more work to refine that, then yes, I've found function points to be a reasonable approach--it's usually fairly straightforward to translate a set of requirements into at least some notion of function points (and if you can't, it's usually because the requirements lack the detail necessary to give a meaningful estimate).

score 1 · Answer 3 · answered Sep 28 '16 at 20:55

I will provide an answer that will certainly not please you, but I have to raise your attention on some points.

In Computer Science, as for any science, before use a mathematical tool, you need to thoroughly assess if your intend use corresponds to the domain of validity of the tool. Are you sure it's the case here ?

Scientific foundation of COCOMO

Statistical estimation models such as COCOMO and alike are based on assumption on the characteristics of software and the development environment, as well as a large statistic of existing projects. The idea is that items sharing the same characteristics are relatively homogeneous.

For instance, the initial COCOMO was developed using regression analysis on a limited set of projects in one industrial branch. Of course, COCOMO II enlarges the base and the criteria taken into account. But it is also based on a statistical model assuming a large historical base. This is reminded in the manual you are referring to:

... assumptions about what life-cycle phases and labor categories are covered by its effort and schedule estimates. These and other definitions (...)were used in collecting all the data to which COCOMO II has been calibrated. If you use other definitions and assumptions, you need to either adjust the COCOMO II estimates or recalibrate its coefficients.

The method was updated in 2000. Otherwise said it dates from the prehistory. So it couldn't take into consideration technological shift in both development tools, and project management, nor characteristics of mobile development.

Do you really think that such method will TODAY have more precision on the estimates that a crystal ball found in the flea market ? -- sorry for being so direct--

The function point alternative

The function point method is something that is IMO simpler to apply than COCOMO. However, it has also its limitations:

first it estimates the overall effort mainly on the external characteristics of the project. This doesn't take into account boosting that you could achieve through software reuse in an object oriented world
again, it is based on accumulation of statistics on large number of projects, in the technology that you're targeting.

Again, do you have valid historical data here ?

Other approaches

Software estimates is a delicate subject since ever. There are a couple of methods that are less formal, but that provide usable numbers. For example:

analogous estimating: you take a similar project, and see how close and how different they are, and adapt the estimate accordingly. The funny thing here is that for the evaluation of the differences you work recursively: for the differences you could estimate the difference based on similar differences measured elsewhere, or you could use the other evaluation methods (including FP and COCOMO), but the level of uncertainty will be lower than starting estimate from the scratch, because you already have a common base which should in principle be a significant part of the overall. In your context, if there's no in house mobile experience, forget about this (unless you can find some existing open source projects and measure their size as starting point - but this supposes s.o. who has some experience to estimate the differences).
delphi method: you take a panel of experts (e.g. senior developers and/or project managers experience with the kind of project you envisage, eventually independent third parties such as expert from collaborative business partners) and come out with a group estimate. The key here, is that experts do not directly interact together, in order not to influence each other, but their individual estimates are aggregated.
variants where you could have direct interaction between experts are possible as well.
you can also take into account some risk management techniques and work for instance with a minimal and a worst case scenario to assess the rance of uncertainty.

In my own experience, these methods are extremely effective. The analogous method is good when there's significant historical data for similar projects. But when entering a new field, the delphi approach (eventually with different trust coefficient for the different participants, according to their respective success track) is I think the most effective. This is because the experts take into account all their knowledge on the project, including confidence on team effectiveness, development methods used, and the dergree o uncertainy and margin they think is necessary).

Conclusion

Some people claim all this is guestimate. They are fully right. But better start with an approximative guess to size the team and get a budget, than with no clue at all. In addition, estimating is not an exact science, and there are many potential sources of errors. So estimating must be reassessed periodically during the project.

How to estimate SLOC for project that has no source code yet?

3 Answers3