ICERM Reproducibility in Computational and Experimental Mathematics: Readings and References
This page collects useful references for the ICERM workshop Reproducibility in Computational and Experimental Mathematics.
Workshop Report: http://stodden.net/icerm_report.pdf
Materials from the ICERM Workshop
See also the abstracts posted on the workshop page... click on "Schedule and Supporting Material".
Thought Pieces Submitted by Participants
- Randy LeVeque, Top Ten Reasons to Not Share Your Code (and why you should anyway). link
- Nicolas Limare, Running a Reproducible Research Journal, with Source Code Inside.link
- Sébastien Li-Thiao-Té, Literate Research versus Reproducible Research. link
- Ursula Martin, The social machine of mathematics. link
- Fernando Perez, Reproducible software vs. reproducible research. link
- Todd Rosenquist and Shane Story, Using the Intel Math Kernal Library and Intel Compilers to obtain Numerical Run-to-run Reproducible Results. link original source
- Anthony Scopatz, Passive Reproducibility: It’s Not You, It’s Me. link
- Benjamin Seibold, Making reproducible computational research a reasonable choice for young faculty on tenure track. link
Slides from 5-Minute Lightning Talks
- Noah Clemons, "How to Enforce Reproducibility with your Existing MKL Code" .pptx
- Neil Chue Hong, "The Foundations of Digital Research" .pdf
- David Ketcheson, online demo link
- Nicolas Limare, "My Christmas List for Reproducibility" .pdf
- Sebastien Li-Thiao-Te, "Lepton : Literate Executable Papers" .pdf
- Benjamin Seibold, .pdf
- Matthias Troyer, "Publishing executable papers" .pdf
- Yihue Xie, "knitr: Starting From Reproducible Homework" .pdf
- Adam Asare, "ITN TrialShare: Promoting reproducible research and transparency in clinical trials" .pptx
- Sara Billey, ""Canonical Representations of Theorems" .pptx
- David Koop, .key
- Sarah Michalek, "Silent Data Corruption and Other Anomalies" .pdf
- Ian Mitchell, "Reproducibility(?) Review Proposal" .pdf
- Geoffrey Oxberry, "Towards Turnkey Reproducibility" .pdf
- Bob Robey, "Enhanced Precision Sums for Parallel Computing Reproducibility" .pdf
- Michael Rubenstein, "The role of computation and data in my number theoretic work" .pdf
- Fernando Seabra Chirigati, .pptx
Breakout Group Summary Slides
- Tools Group link
- Funding Policy Group .pdf
- Journals/Publication Policy Group .pptx
- Numerical Reproducibility Group .pptx
- Tools Group link
- Ontology and V&V Group .pptx
- Rewards/Culture Group .pptx
- Teaching Reproducibility Group link
References and Links Collected
Previous Workshops and Roundtables on Reproducible Research
- Applied Mathematics Perspectives 2011: Reproducible Research: Tools and Strategies for Scientific Computing
- AAAS 2011: The Digitization of Science: Reproducibility and Interdisciplinary Knowledge Transfer
- Community Forum on Reproducible Research, ICIAM 2011.
- Yale Law School: Data and Code Sharing Roundtable, including a link to the resulting Reproducible Research Declaration (pdf) and Contributed Thought Pieces.
Why Reproducibility is an Issue
Examples where Lack of Reproducibility Causes Problems
- Duke Trials: Reports
- [Jha2012] Alok Jha, “Tenfold increase in scientific research papers retracted for fraud,” U.K. Guardian, 1 Oct 2012, available at http://www.guardian.co.uk/science/2012/oct/01/tenfold-increase-science-paper-retracted-fraud.
- [Enserink2012] Martin Enserink, “Final report: Stapel affair points to bigger problems in social psychology,” Science, 28 Nov 2012, available at http://news.sciencemag.org/scienceinsider/2012/11/final-report-stapel-affair-point.html.
Notions of Reproducibility
A variety of terminology is used in connection with reproducible research. The Final Report contains as section on Terminology and below are some links related to some of these terms.
Verification and Validation (V&V)
- What is verification and validation? Wikipedia arcticle
- Example paper following V&V: William J. Rider and Douglas B. Kothe, Reconstructing Volume Tracking, Journal of Computational Physics, Volume 141, Issue 2, 10 April 1998, Pages 112-152.
Quantify the uncertainty in a computation Wikipedia
Identical Code Output
Policies on Data and Code Sharing
Funding Agency Policies
- Digital Research Data Sharing and Management Report from National Science Board Panel, March, 2011.
- SHERPA RoMEO listing and classifying publisher copyright & self-archiving policies for a wide range of academic publishers.
- [IMU2010] IMU General Assembly, “Best Current Practices for Journals”, International Mathematical Union, available at: http://www.mathunion.org/fileadmin/CEIC/bestpractice/bpfinal.pdf
- Mathematical Programming Computation requiring code and data deposit as a condition of publication.
- American Economic Review, Data Submission Requirement.
- ACM Transactions on Mathematical Software Algorithms Policy.
- Science Magazine Policy on availability of data and code.
- Wouters et al., The Public Domain of Digital research Data, 2003. Includes a discussion of the controversial Journal of Cognitive Neuroscience's 2000 requirement of fMRI data submission.
- Gary King's Sample Replication Journal Policy.
- Geophysics source-code guidelines.
- IPOL Software Guidelines and Author Manual. This is cross-referenced from the SIAM Journal on Imaging Science link.
- Geoscientific Model Development model description papers require source code.
- The bioinformatic-journal/software hydrid, 2009.
- Biostatistics kite-marks papers satisfying reproducibility requirements.
Legal Issues and Frameworks
- V. Stodden, Enabling Reproducible Research: Licensing for Scientific Innovation, 2009.
- V. Stodden, The Legal Framework for Reproducible Scientific Research: Licensing and Copyright, Computing in Science and Engineering, vol. 11, no. 1, pp. 35-40, Jan./Feb., 2009.
- The OECD Working Group on Neuroinformatics Neuroscience Data and Tool Sharing: A Legal and Policy Framework for Neuroinformatics Neuroinformatics 1(2), 2003.
- [HIPAAref]: http://www.hhs.gov/ocr/privacy/index.html
Licenses and copyright, citation
- V. Stodden, Legal Attribution and Academic Citation: The Promise of Facilitated CC License Compliance, 2009.
- [Stodden09] "The Legal Framework for Reproducible Research in the Sciences: Licensing and Copyright", IEEE Computing in Science and Engineering, 11(1), January 2009, p.35-40.
- [Stodden12] "Software Patents as a Barrier to Scientific Transparency: An Unexpected Consequence of Bayh-Dole," With I. Reich, The Seventh Annual Conference on Empirical Legal Studies (CELS 2012), Stanford, CA. Nov, 2012.
- Data producers deserve citation credit, Nature Genetics, 2009.
- D. Donoho, How to be a Highly Cited Author in the Mathematical Sciences, 2002.
- H. Piwowar et al. Sharing Detailed Research Data Is Associated with Increased Citation Rate PLoS ONE 2007.
- [Hodges2011] Wilfrid Hodges, “CEIC Copyright Recommendations: What do You Want from Your Publisher?”, link
- Nature Blog 2008: The Great Beyond: Stiglitz and Sulston: Who Owns Science?
- University of Manchester Press Release 2008: Nobel duo ask: "Who Owns Science?"
- Example of social math link
- P. Bourne et al., Open access: taking full advantage of the content, 2008.
- B. McCullough, Do economics journal archives promote replicable research?, 2008.
- M. Gerstein et al., Structured digital abstract makes text mining easy, 2007.
- H. Gibbs, DISC-UK DataShare: State-of-the-Art Review, 2007.
- M. Seringhaus et al., Towards tomorrow's information architecture, 2007.
- M. Barinaga, Still Debated, Brain Image Archives Are Catching On, Science, 2003.
- H. Piwowar, Data Sharing.
- G. Steinhart, DataStaR: An Institutional Approach to Research Data Curation, 2007.
- G. Steinhart et al., Establishing Trust in a Chain of Preservation, D-Lib Magazine, 2009.
Influencing Policy and Changing the Culture
- Arzberger et al., An International Framework to Promote Access to Data, Science, 2004.
- Science Code Manifesto link
- Reproducible Research, Computing in Science and Engineering, 2010.
- Bertrand Meyer, Christine Choppy, Jørgen Staunstrup and Jan van Leeuwen: Research Evaluation for Computer Science, Communications of the ACM, vol. 52, no. 4, April 2009, pages 31-34
- [Meyer2009] Bertrand Meyer, Christine Choppy, Jørgen Staunstrup and Jan van Leeuwen, “Research evaluation for computer science,” Communications of the ACM, vol. 52, no. 4, April 2009, pages 31-34,http://dl.acm.org/citation.cfm?doid=1498765.1498780 available at http://medicina.unica.it/pacs/documenti/Research_Evaluation_CACM.pdf.
- [Patterson1999] David Patterson, Lawrence Snyder and Jeffrey Ullman, “Evaluating Computer Scientists and Engineers For Promotion and Tenure,” August, 1999, link.
Tools and Technologies
Some version control systems (VCS) include:
Some public hosting cites for VCS repositories include
Workflow Management Systems
- Galaxy link
- Madagascar link
- Sumatra link
- Taverna link
- VisTrails link
- Trident link
- VCR link Verifiable Computational Research
- D. Koop, E. Santos, P. Mates, H. Vo, P. Bonnet, B. Bauer, B. Surer, M. Troyer, D. Williams, J. Tohline, J. Freire and C. Silva, A Provenance-Based Infrastructure to Support the Life Cycle of Executable Papers], In Proceedings of the International Conference on Computational Science, 2011. link
- J. Freire and C. Silva, Making Computations and Publications Reproducible with VisTrails, in Computing in Science and Engineering 14(4): 18-25, 2012.
Literate Programming Tools
Some literate programming tools include:
Some notebook/publishing tools include:
- Maple link
- Mathematica link
- Matlab link
- RStudio link
- Sage link
- Open notebook science Wikipedia
- eCAT Electronic Lab Notebook.
Tools that capture and preserve a software environment
Package code along with complete environment (OS, compilers, graphics tools, etc)
Web platforms for running code
Integrated tools for version control and collaboration
Interactive theorem proving
- A Special Issue of AMS Notices on Formal Proof 
- Computer assisted theorem proving E.g.flyspeck
- Automated reasoning link
- Thomas C. Hales, “A proof of the Kepler conjecture,” Annals of Mathematics, vol. 162-3, pg. 1065-1185.
- NIST Digital Library of Mathematical Functions
Tools that can aid in reproducible research
These tools may be useful in conducting reproducible research.
- Matlab function that provides information about the CPU and operating system link
- [Bailey2012] David H. Bailey, Roberto Barrio, and Jonathan M. Borwein, “High precision computation: Mathematical physics and dynamics,” Applied Mathematics and Computation, vol. 218 (2012), pg. 10106-10121.
- [Bailey1992] David H. Bailey, “Misleading performance reporting in the supercomputing field,” Scientific Programming, vol. 1 (Winter 1992), pg. 141-151.
Parallel Computing Issues
- [Borkar2012] Borkar, S. (2012) “Exascale Challenges, Why Resiliency?” talk presented at the Inter-Agency Workshop on HPC Resilience at Extreme Scale, Feburary 21, 2012.
- [Constantinescu2000] Constantinescu, C. (2000) “Teraflops supercomputer: Architecture and validation of the fault tolerance mechanisms” IEEE Transactions on Computers 49:886-894.
- [DarpaResilience2009] (Mootaz) Elnozahy (editor), “System Resilience at Extreme Scale”, available at: http://institute.lanl.gov/resilience/docs/IBM%20Mootaz%20White%20Paper%20System%20Resilience.pdf
- [HECResilience2009] Nathan DeBardeleben, et. al, “High‐End Computing Resilience: Analysis of Issues Facing the HEC Community and Path‐Forward for Research and Development”, available at: http://institute.lanl.gov/resilience/docs/HECResilience_WhitePaper_Jan2010_final.pdf
- [InterAgency2012] John Daly, et. al, “Inter-Agency Workshop on HPC Resilience at Extreme Scale”, available at: http://institute.lanl.gov/resilience/docs/Inter-AgencyResilienceReport.pdf
- [Kola2005] Kola, G., Kosar, T. and M. Livey (2005) “Faults in large distributed systems and what we can do about them” Proceedings of the 11th European Conference on Parallel Processing (Euro-Par 2005).
- [Robey2011] Robey, R., Robey, J., and Aulwes, R., “In Search of Numerical Consistency in Parallel Computing”, Vol. 37, Issue 1, Jan 2011
- [TowardsExascaleResilience2009] Franck Cappello, et. al, “Towards Exascale Resilience”, International Journal of High Performance Computing Applications, Vol 23, Issue 4, Nov 2009, pp 374-388.
Silent Data Corruption
- [Autran2010] Autran, JL, Munteanu, D., Roche, P. , Gasiot, G., Martinie, S., Uznanski, S., Sauze, S., Semikh, S., Yakushev, E., Rozov, S. et al. (2010) “Soft-errors induced by terrestrial neutrons and natural alpha-particle emitters in advanced memory circuits at ground level” Microelectronics Reliability 50: 1822-1831.
- [Li2010] Li, X., Huang, M.C.. Shen, K. and L. Chu (2010) “A realistic evaluation of memory hardware errors and software system susceptibility” Proceedings of the 2010 USENIX conference on USENIX annual technical conference.
- [Michalak2010] Sarah Michalak (2010) “Soft Errors, Silent Data Corruption, and Exascale Computing,” invited talk at the Resilience Summit 2010, available: http://www.csm.ornl.gov/srt/conferences/ResilienceSummit/2010/pdf/michalak.pdf
- [Michalak2012] Sarah Michalak, Andrew DuBois, Curtis Storlie, Heather Quinn, William Rust, David DuBois, David Modl, Andrea Manuzzato and Sean Blanchard (2012) ``Assessment of the Impact of Cosmic-Ray-Induced Neutrons on Hardware in the Roadrunner Supercomputer, IEEE Transactions on Device and Materials Reliability 12:2, 445-454.
- [Constantinescu2005] Constantinescu, C. (2005) “Dependability Benchmarking Using Environmental Test Tools,” Proceedings of the 2005 Reliability and Maintainability Symposium 567-571.
- The Computer As Crucible: An Introduction to Experimental Mathematics, Jonathan Borwein and Keith Devlin. link
- D.H. Bailey and J. M. Borwein, ``Exploratory Experimentation and Computation." Notices of the AMS. 58 (10) (2011), 1410-1419.
- J.M. Borwein and R.E. Crandall, ``Closed forms: what they are and why we care." Notices Amer. Math. Soc. 60:1 (2013), xxx-xxx.
- Jonathan Borwein and Veselin Jungic, ``Organic Mathematics then and now." Notices of the AMS, 59 (2012), 416-419.
- Jonathan Borwein, Peter Borwein and Veselin Jungic, ``Remote Collaboration: Six Years of the Coast-To-Coast Seminar Series." Science Communication, 34 (3) (2012), 419-428.
- [Borwein2008] Jonathan M. Borwein and David H. Bailey, Mathematics by Experiment: Plausible Reasoning in the 21st Century, A K Peters, Natick, MA, 2008.
Education, Courses, and Training
Regular courses teaching some aspects of reproducibility
- Reproducible Research & Software Carpentry at UBC taught by Ian Mitchell and Dhavide Aruliah
- UT-Austin course (need link)
- CS 291 at KAUST (need link)
- High performance scientific computing course at Univ. of Washington taught by Randy LeVeque
- How to Write a Publishable Paper as a Class Project by Gary King
Short courses and summer schools
- Winter School in eScience on Reproducible Science And Modern Scientific Software, Geilo, Norway, 2013.
On-line tutorials and other sources
- Best Practices for Scientific Computing, arXiv.org, 2012.
- Workflows for Reproducible Research in Computational Neuroscience, Andrew Davison.
Other Readings and Publications
- N. Barnes, Publish your computer code: it is good enough, Nature 467 (2010) p. 753. link
- Z. Merali, Computational science: ...Error Why scientific programming does not compute. Nature 467(2010), pp. 775-777. link
- K. A. Baggerly and D. A. Berry, Reproducible Research, AMSTAT NEWS, Jan. 1, 2011 link
- A. Jogalekar, Computational research in the era of open access: Standards and best practices, Scientific American (2013) link
- A. Morin, J. Urban, P.D. Adams, I. Foster, A. Sali, D. Baker, and P. Sliz, Shining Light into Black Boxes, Science 336 (2012) link.
- Juliana Freire, Philippe Bonnet and Dennis Shasha, Computational reproducibility: state-of-the-art, challenges, and database research opportunities, in Proceedings of SIGMOD, 593-596 (2012).
- R. Peng, Reproducible Research and Biostatistics, Biostatistics, November 2009.
- S. Fomel and J. Claerbout, "Reproducible Research", Guest Editors' Introduction to a Special Issue of CiSE. link
- D. Donoho et al. Reproducible Research in Computational Harmonic Analysis, 2009. (pdf).
- I. Manolescu et al., Repeatability & Workability Evaluation of SIGMOD 2009, 2009.
- Philippe Bonnet, et al., Repeatability and workability evaluation of SIGMOD 2011, SIGMOD Record, 40, Issue 2 (June 2011), pp. 45-48.
- P. Vandewalle, J. Kovačević, and M. Vetterli, Reproducible Research in Signal Processing - What, Why, and How, 2009.
- J. J. Quirk, Computational Science "Same old silence, same old mistakes", something more is needed. link
- B. McCullough, Got Replicability? The Journal of Money, Credit and Banking Archive, 2007.
- B. McCullough, McGeary, and Harrison, Lessons from the JMCB Archive, 2006.
- MATLAB Blog 2009: Reproducible Research in Signal Processing.
- R. LeVeque, Wave propagation software, computational science, and reproducible research, 2006.
- J. Ioannidis, Why Most Published Research Findings Are False, 2005.
- R. Anderson et al. The Role of Data & Program Code Archives in the Future of Economic Research, 2005.
- A. Rossini, F. Leisch, Literate Statistical Practice, 2001.
- D. Donoho, J. Buckheit, WaveLab and reproducible research, 1995.
- G. King, Replication, Replication, 1995.
- K. Price, Anything You Can Do, I Can Do Better (No You Can’t)..., 1986.
- M. Schwab, N Karrenbach and J Claerbout. Making scientific computations reproducible Computing in Science & Engineering, 2000.
- P. Schofield et al. Post-publication sharing of data and tools Nature 461, 10 September 2009.
- WaveLab, reproducible research in wavelets;
- Gentleman et al., Bioconductor: Open Software Development for Computational Biology and Bioinformatics, 2004.
- EPFL Audiovisual Communications Lab page containing reproducible papers.
- SparseLab, reproducible research in sparse modeling and compressed sensing;
- R. LeVeque, Python Tools for Reproducible Research on Hyperbolic Problems, Computing in Science and Engineering, vol. 11, no. 1, pp. 19-27, Jan./Feb., 2009. [link with tools http://www.amath.washington.edu/~rjl/pubs/cise09].
- C. Savage and A. Vickers, Empirical Study of Data Sharing by Authors Publishing in PLoS Journals, 2009. (Some reactions to the article).
- J. Ioannidis et al., Repeatability of Published Microarray Gene Expression Analyses, 2008.
- K. Baggerly and K. Coombes, Deriving Chemosensitivity from Cell Lines: Forensic Bioinformatics and Reproducible Research in High-Throughput Biology, 2009.
- G. Baiocchi Reproducible Research in Computational Economics: Guidelines, Integrated Approaches, and Open Source Software, 2007.
- Patterns of information use and exchange: case studies of researchers in the life sciences, Nov 2009.
- Alzheimer’s Disease Neuroimaging Initiative (ADNI) Data Website, and their data sharing and publication policy;
- Integrating with integrity, Nature Genetics, 2010.
- Sulston's 2002 account of the story of the 1996 Bermuda Declaration: "Heritage of Humanity".
- The Toronto Statement, 2009.
- E. Marshall, Bermuda Rules: Community Spirit, With Teeth, Science, 2001.
- [King2006] King, G. “Publication, Publication”. PS: Political Science and Politics, Vol. XXXIX, No. 1 (January, 2006), 119-125
- M. Hildebrandt, Profiling the European Citizen: Cross-disciplinary Perspectives, 2008.
- Brian Matthews, Brian McIlwrath, David Giaretta, Esther Conway, The Significant Properties of Software: A Study, JISC Report, 2008.
- [Hager2012] Georg Hager, “Fooling the masses – Stunt 9: Boast massive speedups with accelerators!,” blog, 3 Nov 2012, available at http://blogs.fau.de/hager/category/fooling-the-masses.
- [Panzer-Steindel2007] Panzer-Steindel, B. (2007) “Data integrity,” available: http://indico.cern.ch/getFile.py/access?contribId=3\&sessionId=0\&resId=1\&materialId=paper\&confId=13797.