Tags:
integrity1Add my vote for this tag reproducibility2Add my vote for this tag create new tag
, view all tags, tagging instructions

Reproducible Research Standards and Definitions

An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. - D. Donoho

Reproducible Research (RR) is the practice of distributing, along with a research publication, all data, software source code, and tools required to reproduce the results discussed in the publication. As such the RR package not only describes the research and its results, but becomes a complete laboratory in which the research can be reproduced and extended. - orgmode.org

In his 1994 editorial The Scandal of Poor Medical Research, Doug Altman decries the poor reliability of the majority of medical research. Medical research is afflicted with common methodologic problems including inappropriate study design, analysis, and interpretation. A significant portion of medical research is not reproducible because of these problems, and because researchers even have difficulties reproducing their own findings. Even when a research journal allows the authors the "luxury" of having space to describe their methods, such text can never be specific enough for readers to exactly reproduce what was done.

There are many ways that research is non-reproducible, including

  1. Tweaking instrumentation to work on a series of patients but not in general
  2. Pre-statistician "normalization'' of data and background subtraction
  3. Poorly studied high-dimensional feature selection
  4. Programming errors
  5. Lack of documentation
  6. Failing to script multiple-step procedures
  7. Copying and pasting results into manuscripts
  8. Insufficient detail in scientific articles
  9. Lack of an audit trail
Science requires that one must be able to reproduce all results. There is a growing realization that all steps related to data manipulation and processing, transformation and analysis be scripted and the purely interactive approaches that do not generate an audit trail, such as Excel, do not play a role in scientific investigation.

Regarding statistical reports, when components (calculations, tables, or graphs) change, being able to automatically recompile a report results in major gains in efficiency and freedom from transcription errors. Markup languages such as LaTeX, html, and special XML formats are useful for this purpose. Literate programming, in which a single source document contains analysis code as well as all the text for the final report, has been found to result in better documentation for the code as well as the text.

Projects require multiple programming and writing steps (click here for more information):

  1. create/update primary database (e.g., using SQL)
  2. create/update extractions from primary database (e.g., using SAS or R to merge data tables)
  3. create/update S analysis files
  4. obtain computed values, tabular, and graphics output on latest data
  5. assemble new computed values, tables, and graphics into a report
  6. recompile the report into a final output format such as PDF
The more of these steps that can be automated the more efficient and error-free the analysis becomes. With proper planning, an entire report, manuscript, dissertation, or book can be reproduced by running a single computer command when changes occur in the operating system, statistical software, graphics engines, sources data, derived variables, analysis, or interpretation.

Time turns each one of us into another person, and by making effort to communicate with strangers, we help ourselves to communicate with our future selves. - Schwab and Claerbout

It has been said that the analysis code provides the ultimate documention of the "what, when, and how" for data analyses.

Topic revision: r4 - 24 Nov 2010 - 09:17:57 - JamesWare
 

Copyright & by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CTSPedia? Send feedback