Simulation

“…all models are wrong, but some are useful.”
George E. P. Box, Statistician

Stochastic models, also called Monte Carlo simulations, use random number generators to model chance or random events. Advances in computing technology and the associated development of computer-intensive simulation methods have had profound effects on statistical science. Frequently, research questions can now be answered by simulating the physical process of gathering data, rather than making actual observations. Exponent statisticians use Monte Carlo simulation for a variety of purposes: to make statistical inferences, to understand the characteristics of statistical procedures, and to approximate the behavior of complex and random physical processes and systems.

Error Bounds and Choosing Between Hypotheses

In reasoning with data, statisticians routinely engage in such activities as computing error bounds for estimates of quantities of interest or weighing the evidence in favor of one research hypothesis versus another. Although relatively simple formulas exist for computing error bounds and testing hypotheses in many common types of problems, simulation methods have become very useful and generally accepted tools for finding answers in studies that require more complicated statistical modeling.

The Sensitivity of Statistical Answers

Exponent statisticians may analyze test data to determine whether a product meets specifications, or we may predict the number of product failures that will occur in a future period based on analysis of field performance data. These types of statistical analyses, and the answers they yield, typically depend on mathematical models of the phenomenon under study. Data may be assumed to come from a set of population values that vary according to a known probability model, such as a normal (Gaussian) or Weibull distribution. In real-world problems, however, the distribution assumption may be inappropriate or only approximately satisfied.

Simulation studies are useful in answering the important consequent question, “How far can we go wrong by making decisions on the basis of statistical methods that use assumptions that are only approximately satisfied?” A computer program is created to generate a data set that exactly meets the necessary original assumptions and then to vary the simulated population in a way that moves the population farther and farther from the original assumptions. The proposed statistical methods are applied to data from each of the simulated populations to measure the effect of departures from assumptions on the final answers. A statistical method is considered robust if deviation from the underlying assumptions has little impact on the conclusions to be drawn from the data analyses.

Imitating Complex Processes

Standard probability models may not provide an adequate description of the relationship between the data collected and the process under study. For example, the relationship between cancer risk and environmental factors is complex; it may be useful to create a computer simulation of the process of cancer initiation and growth. Understanding of the variability in a manufacturing process may be improved by creating a computer simulation of the inputs and outputs of the process. The simulations are used to economically evaluate potential improvements to the process. Changes that look promising in the simulation can then be tested in the actual manufacturing process.

Judging whether a simulation is sufficiently reliable typically involves the three-step approach of verification, validation, and accreditation. Verification means that the computer code is an accurate representation of what the statistician intended. Validation involves determining how closely the simulation matches the actual system being imitated. External validation is accomplished by comparing simulation output with real-world experience. Accreditation refers to the certification that a simulation model is acceptable for use for a specific purpose.

Professionals