Contributed Commentary by Bert Gunter

November 18, 2019 | Sridhar Iyengar in his October 28, 2019 Bio-IT World commentary references a 2015 PLOS Biology estimate that $28 billion dollars is "wasted" every year in failed or irreproducible preclinical life science research. To combat such waste, he advocates building comprehensive lab monitoring and information systems that, using "AI", can "correlate fluctuations in many different variables" to deviations in observed results. In this way, one can discover "unknown unknowns" that are affecting research experiments and processes so that they can be predicted and eliminated.

I think few would disagree with such recommendations. The issue, as Iyengar says, is their practicability. There is, however, another completely different approach to tease out from the "myriad factors" those that can cause irreproducible experiments that is relatively simple to implement, requires no additional systems, and has been successfully used in many industrial and academic research environments for over 75 years: factorial experimentation, in particular, 2-level screening experiments.

Briefly, a factorial experiment is an experiment in which multiple experimental variables, such as temperature, humidity, machine maintenance, and reagent lots—i.e. those "myriad factors" that Iyengar is concerned with—are simultaneously changed in a carefully prespecified sequence of "runs," where a "run" is defined as a fixed combination of settings of the experimental variables being investigated.

There are a number of features of this approach that deserve emphasis. First, the expert knowledge of laboratory personnel is required to choose the variables to study (as would also be the case for choosing variables to monitor). These variables must be controllable by the experimenters, at least during the experiment: it is precisely this deliberate control and manipulation that yields the information to determine their effects, if any.

Second, unlike the standard experimental paradigm of varying only one factor at a time (OFAT) while holding others constant, factorial design requires that multiple factors be changed simultaneously, but in a rigorously determined and pre-defined pattern of runs. This allows not only the individual effects of the factors to be disentangled far more informatively than OFAT, but also can provide evidence of possible interactions among the factors that is effectively impossible with OFAT. Such interactions, rather than individual variable effects—e.g. problems when both temperature and humidity are high, but not when only one is—can frequently be the cause of irreproducibility. It is not clear that Iyengar's AI driven search for "correlations" can do this.

Finally, the power and remarkable economy of such screening experiments allows great flexibility in adapting to practical needs and constraints. One can study up to 7 factors in only 8 experimental runs; up to 11 in 12 runs; up to 15 in 16 runs; and so forth. The only limitation is the imagination of researchers in identifying factors with which to experiment and their ability to control and manipulate them in the experimental process.

This is not the place to lay out in detail how this technology works. I have written a brief and hopefully accessible introduction to the basic ideas specifically in the context of experimental irreproducibility here: https://arxiv.org/abs/1807.05944 . More detailed discussions can be found in the literature, including the references listed therein.

One of the perceived impediments to implementing such experimental methodology is that it involves statistical complexity that may be beyond the training of biology researchers. While this may have been true in the distant past before computers and modern software interfaces, it is certainly no longer the case. A colleague and I have written a short "handbook" (Dan Coleman and Bert Gunter. A DOE Handbook. Amazon) that explains the basics and provides a "catalogue" of designs such as those mentioned above. The data "analyses" consist entirely of simple calculations and plots that are easily implemented—if they are not already present—in practically any scientific/graphics software. This is typically the story for well-designed and well-executed experiments: the results are clear and direct, and do not require fancy math to obtain.

It is worth emphasizing that none of this should be viewed as diminishing the value of Iyengar's recommendations. Screening experiments can help determine how key factors affect product variation or experimental results and can even indicate the operational ranges in which they must be maintained. But ongoing monitoring is necessary to assure that this is being done and to identify possible new sources of variability that must be further investigated when problems arise. As Iyengar says, "Science is complex, … [with a] vast array of [possible] causes of failed experiments." Given this complexity, it behooves us to use all available tools at our disposal. Two-level factorial screening experiments are yet another tool that should be part of this toolkit.

Bert Gunter is a retired statistician who over his career worked with scientists and engineers in R&D, manufacturing, and quality control in a variety of industries, including over 20 years in pharmaceuticals. He is an elected Fellow of The American Statistical Association. He can be reached at bgunter.4567@gmail.com

Reducing Wasted Life Science Research By Using Factorial Experiments

Contributed Commentary by Bert Gunter