On models and validation in release planning research

During this vacation season I’ve had time to think about the current state of software release planning research. Considering the subject of release planning in software development, great majority of publications propose some kind of new model or method for release planning. Release planning is defined as a mathematical optimization problem. The goal might be to generate a plan which produces most value, when the resources and constraints such as schedule are known. For example, in Release planning under fuzzy effort constraints, published in proceedings of the Third IEEE International Conference on Cognitive Informatics, An Ngo-The, G. Ruhe, and Wei Shen, pages 168–175, 2004, the goal is to maximize the total value of the release plan to stakeholders who vote on the value of the individual requirements based on three different factors.

Svahnberg et al. published an excellent systematic review of such release planning models (A systematic review on strategic release planning models, Information and Software Technology, Volume 52, Issue 3, pages 237-248). What is most notable in the results of the review is the level of empirical validation of the models. Only two of the 24 models were validated in full scale industrial use. According to the review, most of the models were validated by case studies.

Here is the first important point. The term case study has quite a different meaning in release planning research compared to the meaning in, for example, social research. In release planning research the term case study is used in multiple different meanings. What is called a case study might be simply a simulated run of a model using data from a past (real) software development project. Or a case study might be an attempt to use the model in context of re-developing an existing software system, and then comparing the results to the initial attempt. Or case study might be an attempt to use the method in an real industrial software development project and gather feedback about the method, but without reporting how the method actually affected the project plans,  success or output. Unfortunately, the (miss)use of the term makes assessing the validation quality of the release planning models difficult. Considering the three examples above, I would use the term simulation in the first case, quasi-experiment in the second, and perhaps industrial evaluation for the third. In my humble opinion, a case study should be about a “how” or “why” question being asked about contemporary set of events over which the investigator has little or no control (Robert K. Yin, Case Study Research, 1994).

This leads us to the second, bigger problem regarding models and methods in software release planning research. Great majority of models and methods are simply not validated in any kind of realistic (industrial) setting. Even if a validation of a model or method is published along the description of the method, great majority of the validation of a method is never replicated. For example, I have never seen a replication of a release planning model validation, although Svahnberg et al. list 24 different models and 28 articles published between 1997 and 2008. Instead of validating existing models, even their own, model-based release planning researchers seem to prefer building and publishing new, more and more complex models. From what I have understood from discussions with my colleagues, researchers’ access to real software industry is very limited in many countries, and especially difficult in the USA. This might be one reason for the low level of validation. Another reason might be that many SE journals (still) prefer to publish mathematical models over empirical research.

With the current state of validation, I feel like the progress in release planning research has stalled. New models and optimization algorithms are certainly still published. But without proper validation, and more importantly, empirical comparison (preferably by real longitudinal industrial experiments) between previously proposed models, the software industry has little to gain from the current release planning research. I have to agree with Karl E. Wiegers, who wrote an excellent and still topical article in 1998: “Read My Lips: No New Models” (IEEE Software, September/October, 1998).