Best Practices for Researchers Publishing Computational Results
This guide is meant both as a living, evolving, and changing document, and as a generic approach with the goal of being as broadly useful as possible to computational scientists wishing to practice reproducible research. Many fields have specialized and accepted ways of implementing the principles embodied in these best practices. I will elucidate these when I can, providing more detail over time. In the meantime, this is meant to provide a rough guide to producing and disseminating really reproducible computational research.
Suggestions and collaborations are welcome. Just create an account before editing this wiki and/or direct any email comments to me at BestPracticesATstodden.net .
A fixed citable version of this document is available here. If you have found this information useful in your own practice of reproducible research, please cite the document: V. Stodden, "Best Practices for Researchers When Publishing Computational Results," SSRN, Sept 1, 2012.
Principles and Best Practices
In this document we envision reproducibility as a digital concept, to distinguish it from reproducibility from first principles (Notions of reproducibility), or repeating an experiment from scratch. Our goal is to facilitate the verification of computational results from the digital data files and the computer instructions required to generate the tables and figures in published articles.
This document assumes you have the right to make the data and code publicly available. Usually this means you either generated the data and code yourself, or have permission from others who did so.
Best Practices at Publication
- Data must be available and accessible. In this context the term "data" means the raw data files used as a basis for the computations, that are necessary for others to regenerate published computational findings.
- Code and methods must be available and accessible. The traditional methods section in a typical publication does not communicate sufficient detail for a knowledgeable reader to replicate computational results. A necessary action is making the complete set of instructions, typically in the form of computer scripts or workflow pipelines, conveniently available.
- Citation. Do it. If you use data you did not collect from scratch, or code you did not write, however little, cite it. Citation standards for code and data are discussed but it is less important to get the citation perfect than it is to make sure the work is cited at all.
- Copyright and Publisher Agreements. Publishers, almost uniformly, request that authors transfer all ownership rights over the article to them. All they really need is the authors' permission to publish.
- Supplemental materials. Publishers should establish style guides for supplemental sections, and authors should organize their supplemental materials following best practices.
Pre-publication Best Practices
- Data management best practices
- Programming Best Practices
- Provenance, Workflow Tracking, and Publishing Environments
Influences from Sources External to the Research Process
- Funded Research Requirements.
- Publisher Requirements.
- Institutional Intellectual Property Requirements.
References and further reading
- Tools and technology
- High performance computing and reproducibility
- Influencing policy and changing the culture
- Education, courses, and training
- References and Further Reading
- Other Online Best Practice Guides for Research