76

Do there exist studies done on the effectiveness of statically vs dynamically typed languages?

In particular:

  • Measurements of programmer productivity
  • Defect Rate

Also including the effects of whether or not unit testing is employed.

I've seen lots of discussion of the merits of either side but I'm wondering whether anyone has done a study on it.

Winston Ewert
  • 25,052

7 Answers7

46

Some suggested reading:

Not exactly on static typing, but related:

Some interesting articles or essays on the subject or on static analysis of programs in general:

And for the ones who would be wondering what this is all about:

However, I doubt any of these with give you a direct answer, as they don't do exactly the study you're looking for. They will be interesting reads though.

Personally, I firmly consider that static typing over dynamic typing facilitates bug detection. I spend way too much type looking for typos and minor mistakes like these into JavaScript or even Ruby code. And when it comes to the view that Dynamic Typing gives you a boost in productivity, I think that mostly comes down to tooling. If statically typed languages have the right tools to allow for background recompilation and provide an REPL interface, then you get the benefits of both worlds. Scala provides this for instance, which makes it very easy to learn and prototype away in the interactive console, but gives you the benefits of static typing (and of a stronger type system than a lot of other languages, ML-languages aside). Similarly, I don't think I have a loss of productivity by using Java or C++ (because of the static typing), as long as I use an IDE that helps me along. When I revert to coding only with simple configurations (editor + compiler/interpreter), then it feels more cumbersome and dynamic languages seem easier to use. But you still hunt for bugs. I guess people would say that the tooling issue is a reversible argument, as if tooling were better for dynamic languages, then most bugs and typos would be pointed out at coding-time, but that reflects the flaw in the system in my opinion. Still, I usually prototype in JRuby and will code in Java later most of the things I do.

WARNING: Some of these links are unreliable, and some go through portals of various computing societies using fee-based accesses for members. Sorry about that, I tried to find multiple links for each of these but it's not as good as I'd like it to be.

haylem
  • 29,005
22

Just yesterday I've found this study: Unit testing isn't enough. You need static typing too.

Basically the author used a tool able to convert automatically a project from a non-static typing language into a static typing one (python to haskell)

Then he selected a number of open source Python projects that also included a reasonable amount of test units, and automatically converted them to haskell.

The translation to Haskell revealed a series of errors related to the type of the variables: the errors weren't discovered by the test units.

PBrando
  • 320
11
  • Link to discussion of ACM paper "An Experiment About Static and Dynamic Type Systems" (2010) by Stephan Hanenberg article (referenced by Lorin Hochstein in a previous post).
  • Conclusion: Productivity for similar quality was higher in a dynamic language.
  • Potential biases/validity issues: Experimental subjects were all students. Also, limited variety of the programming tasks (subjects were asked to implement a scanner and parser).
  • ACM paper "Do Programming Languages Affect Productivity?" (2007) by Delorey, Knudson, and Chun.
  • Conclusion: JavaScript, Tcl, Perl more productive than C# C++ and Java. Python and PHP fall in the middle.
  • Potential biases/validity issues: No measure of quality (such as bugs discovered post-release). No measure of reliability (is software written in statically typed languages more dependable?). Sample bias - all projects were open taken from open source CVS repositories. Also, no distinction between weakly and strongly typed languages (i.e. pointers).
  • Thesis "Empirical Study of Software Productivity and Quality" (2008) by by Michael F. Siok
  • Conclusion: Choice of programming language does not significantly influence productivity or quality. However, it does affect labor costs and "quality within the overall software projects portfolio".
  • Potential biases/validity issues: Restricted to avionics domain. Programming languages could have all been statically typed. I didn't read the thesis, so I cannot evaluate its rigor.
    My opinion. Although there is weak evidence that dynamically typed languages are more productive, it is not conclusive. (1) There are many factors that were not controlled, (2) there are too few studies, (3) there has been little or no discussion about what constitutes an appropriate test method.
ahoffer
  • 653
6

Here's a starting point:

The paper is challenging the commonly received wisdom that, all else being equal, programmers write the same number of lines of code per time regardless of language. In other words, the paper should serve as supporting empirical evidence that mechanical productivity (lines of code written) is not a good measure of functional productivity, and must at least be normalized by language.

gnat
  • 20,543
  • 29
  • 115
  • 306
4

I have found a Static vs. dynamic languages: a literature review, which lists some studies on the subject and gives a nice summary on each study.

Here's the executive summary:

Of the controlled experiments, only three show an effect large enough to have any practical significance. The Prechelt study comparing C, C++, Java, Perl, Python, Rexx, and Tcl; the Endrikat study comparing Java and Dart; and Cooley’s experiment with VHDL and Verilog. Unfortunately, they all have issues that make it hard to draw a really strong conclusion.

In the Prechelt study, the populations were different between dynamic and typed languages, and the conditions for the tasks were also different. There was a follow-up study that illustrated the issue by inviting Lispers to come up with their own solutions to the problem, which involved comparing folks like Darius Bacon to random undergrads. A follow-up to the follow-up literally involves comparing code from Peter Norvig to code from random college students.

In the Endrikat study, they specifically picked a task where they thought static typing would make a difference, and they drew their subjects from a population where everyone had taken classes using the statically typed language. They don’t comment on whether or not students had experience in the dynamically typed language, but it seems safe to assume that most or all had less experience in the dynamically typed language.

Cooley’s experiment was one of the few that drew people from a non-student population, which is great. But, as with all of the other experiments, the task was a trivial toy task. While it seems damning that none of the VHDL (static language) participants were able to complete the task on time, it is extremely unusual to want to finish a hardware design in 1.5 hours anywhere outside of a school project. You might argue that a large task can be broken down into many smaller tasks, but a plausible counterargument is that there are fixed costs using VHDL that can be amortized across many tasks.

As for the rest of the experiments, the main takeaway I have from them is that, under the specific set of circumstances described in the studies, any effect, if it exists at all, is small.

Moving on to the case studies, the two bug finding case studies make for interesting reading, but they don’t really make a case for or against types. One shows that transcribing Python programs to Haskell will find a non-zero number of bugs of unknown severity that might not be found through unit testing that’s line-coverage oriented. The pair of Erlang papers shows that you can find some bugs that would be difficult to find through any sort of testing, some of which are severe, using static analysis.

As a user, I find it convenient when my compiler gives me an error before I run separate static analysis tools, but that’s minor, perhaps even smaller than the effect size of the controlled studies listed above.

I found the 0install case study (that compared various languages to Python and eventually settled on Ocaml) to be one of the more interesting things I ran across, but it’s the kind of subjective thing that everyone will interpret differently, which you can see by looking.

This fits with the impression I have (in my little corner of the world, ACL2, Isabelle/HOL, and PVS are the most commonly used provers, and it makes sense that people would prefer more automation when solving problems in industry), but that’s also subjective.

And then there are the studies that mine data from existing projects. Unfortunately, I couldn’t find anybody who did anything to determine causation (e.g., find an appropriate instrumental variable), so they just measure correlations. Some of the correlations are unexpected, but there isn’t enough information to determine why.

The only data mining study that presents data that’s potentially interesting without further exploration is Smallshire’s review of Python bugs, but there isn’t enough information on the methodology to figure out what his study really means, and it’s not clear why he hinted at looking at data for other languages without presenting the data3.

Some notable omissions from the studies are comprehensive studies using experienced programmers, let alone studies that have large populations of “good” or “bad” programmers, looking at anything approaching a significant project (in places I’ve worked, a three month project would be considered small, but that’s multiple orders of magnitude larger than any project used in a controlled study), using “modern” statically typed languages, using gradual/optional typing, using modern mainstream IDEs (like VS and Eclipse), using modern radical IDEs (like LightTable), using old school editors (like Emacs and vim), doing maintenance on a non-trivial codebase, doing maintenance with anything resembling a realistic environment, doing maintenance on a codebase you’re already familiar with, etc.

If you look at the internet commentary on these studies, most of them are passed around to justify one viewpoint or another. The Prechelt study on dynamic vs. static, along with the follow-ups on Lisp are perennial favorites of dynamic language advocates, and github mining study has recently become trendy among functional programmers.

Mr.WorshipMe
  • 141
  • 2
0

I honestly do not think that Static vs Dynamic typing is the real question.

I think that there are two parameters that should come first:

  • the expertise level in the language: the more experienced you are, the more you know about the "gotchas" and the more likely you are to avoid them / track them down easily. This is also true about the particular application/program you are working on
  • testing: I love static typing (hell I like programming in C++ :p) but there just so much that a compiler / static analyzer can do for you. It's just impossible to be confident about a program without having tested it. And I am all for fuzzy testing (when applicable), because you just can't think about all possible input combinations.

If you are comfortable in the language, you'll write code and you'll track down bugs with ease.

If you write decoupled code, and test each functionality extensively, then you'll produce well-honed code, and thus you'll be productive (because you cannot qualify as productive if you do not assess the quality of the product, can you ?)

I would therefore deem that the static vs dynamic debate with regard to productivity is quite moot, or at least vastly superseded by other considerations.

Matthieu M.
  • 15,214
0

Here are a few:

  • Stefan Hanenberg. 2010. An experiment about static and dynamic type systems: doubts about the positive impact of static type systems on development time. In Proceedings of the ACM international conference on Object oriented programming systems languages and applications (OOPSLA '10). ACM, New York, NY, USA, 22-35. DOI=10.1145/1869459.1869462 http://doi.acm.org/10.1145/1869459.1869462

  • Daniel P. Delorey, Charles D. Knutson, Scott Chun, "Do Programming Languages Affect Productivity? A Case Study Using Data from Open Source Projects," floss, pp.8, First International Workshop on Emerging Trends in FLOSS Research and Development (FLOSS'07: ICSE Workshops 2007), 2007

  • Daly, M.; Sazawal, V., Foster, J.: Work in Progress: an Empirical Study of Static Typing in Ruby, Workshop on Evaluation and Usability of Programming Languages and Tools (PLATEAU) at ON-WARD 2009.

  • Lutz Prechelt and Walter F. Tichy. 1998. A Controlled Experiment to Assess the Benefits of Procedure Argument Type Checking. IEEE Trans. Softw. Eng. 24, 4 (April 1998), 302-312. DOI=10.1109/32.677186 http://dx.doi.org/10.1109/32.677186