Data, Statistics, and Empirical Science
Statistics is the science that concerns data—their collection, description, analysis, and interpretation. Adherence to sound statistical principles in conducting research studies strengthens the basis for empirical findings and conclusions. Exponent statisticians, programmers, and analysts apply their expertise in these principles from the design stage of an empirical research study through to its successful completion and conclusion:
Design of Studies, Surveys, and Experiments
Several basic questions require thoughtful consideration in assembling evidence to address a research problem:
- What is to be measured? Care should be taken to ensure that the problem is defined with sufficient precision so that meaningful research hypotheses can be formulated and evaluated with data that are available or that can be reliably gathered.
- What type of study is to be done? Controlled experiments, sample surveys, and observational studies are different ways of collecting data. Researchers must be cognizant of the strengths and limitations of each approach in determining which is most suitable given the characteristics of a particular project.
- How many observations are needed? Studies with too few observations lack statistical power and cannot yield convincing answers to the questions that originally motivated the work. Studies with too many observations may misallocate precious resources and risk confusing statistical and practical significance.
- What can be concluded about cause and effect? Controlled experiments, such as clinical trials, offer the strongest basis for causal inference, but they are typically not feasible for risk assessments and similar projects involving human subjects. Observational studies are more widely used in such cases, but their conclusions require more qualifications and attention to uncontrolled and potentially distorting (or “confounding” factors).
- What are potential sources of bias? Bias can be introduced and studies can be undermined if elements of the population of interest cannot be sampled or measured reliably. Such bias can sometimes be minimized or eliminated by redefining the population or by modifying the data collection procedures.
Statisticians are typically of most value at the earliest stages of a research project, because even the most sophisticated analysis cannot salvage a poorly conceived or executed study design.
Related Case Study/Studies:
Design of Experiments and Observational Studies – Skiing and Snowboard Injuries
Study Design – Minimum Permeability of Concrete
Statistical Sampling – Construction Defects
back to top
Database Development
Exponent maintains and analyzes databases that provide accident and registration data for all cars and trucks in the U.S. In addition to data analysis, we provide database services including customized, windows-based programming tools that allow clients to effortlessly explore data relevant to their particular interests.
We utilized state-of-the-art database technology to modernize our data processing techniques so as to provide quick processing time, transparency and efficient maintenance. We also wanted to develop user-friendly data query tools that would allow our own group members and clients easy access to data.
We migrated our vehicle registration databases to MS Access and SQL Server environments and developed a series of procedures from within this environment to handle annual data processing and updating. The new methods provided desired speed, transparency, and maintenance improvements. In addition, we developed a Windows-based, user-friendly query tool allowing instant, mouse-click access to vehicle registration data dating from 1975 to the present.
With the ability to create, store, and access increasingly larger data files containing millions of records comes the challenge of finding and implementing effective data mining techniques to discover knowledge in these systems. Therefore, we offer complementary experience with such recently developed tools as classification and regression trees, and we monitor new developments in this rapidly evolving area of research.
Related Case Study/Studies:
Virtual and Constructive Simulation for System Evaluation – Angler Activities on an Urban River
back to top
Displaying and Summarizing Data
Oftentimes, the findings of a study are most powerfully rendered not by complex statistical models but by simple visual displays. The most suitable display will depend on the types of data that have been gathered and the roles of key variables. The experienced analyst also relies on informative plots and numerical summaries of data to check for anomalies in data collection, to identify important associations between variables, and to evaluate assumptions underlying candidate statistical models. Without thorough exploratory analysis, subtle features of the data may be overlooked with potentially unfortunate consequences for the study’s findings and conclusions.
Related Case Study/Studies:
Data Visualization – Spine Implants
back to top
Statistical Inference and Decision Making
Statistical inference refers to the process of reasoning about a population on the basis of sample information. Inference includes commonly applied methods for point and interval estimation of characteristics, as well as the testing of research hypotheses. Although most researchers have been educated on the basic ideas of statistical inference, the analytic demands of even moderately sized studies can quickly exceed the capabilities of those without extensive statistical training. Multiple regression methods may be required to adjust for confounding factors, transformations of variables may be needed to improve compliance with model assumptions such as linearity and constant variance, or the response variable of interest may be categorical rather than continuous. At many academic institutions, the appropriate models for treating such cases are taught only to those receiving graduate-level training in statistics.
Data used in risk analysis often come from multistage national probability samples. In such instances, advanced techniques for the analysis of complex survey data may be appropriate. Other specialized methods, such as survival analysis and time series, find application in many studies involving data on lifetimes or repeated observations of a process over time.
Exponent statisticians possess advanced training in stochastic modeling. Projects range from time dependent Markov models of employment processes to logistic regression models of risk of vehicle rollover to models of the sale and use of decorative candles.
Related Case Study/Studies: Statistical Modeling – Candle Risk
back to top
|