ICERM Reproducibility in Computational and Experimental Mathematics: Readings and References
This page collects useful references for the ICERM workshop Reproducibility in Computational and Experimental Mathematics.
Short link: http://icerm.brown.edu/tw12-5-rcem-wiki.php, http://is.gd/RRlinks, or http://goo.gl/QbDOx.
Workshop Report: http://stodden.net/icerm_report.pdf
Materials from the ICERM Workshop
See also the abstracts posted on the workshop page... click on "Schedule and Supporting Material".
Thought Pieces Submitted by Participants
- Randy LeVeque, Top Ten Reasons to Not Share Your Code (and why you should anyway). link
- Nicolas Limare, Running a Reproducible Research Journal, with Source Code Inside.link
- Sébastien Li-Thiao-Té, Literate Research versus Reproducible Research. link
- Ursula Martin, The social machine of mathematics. link
- Fernando Perez, Reproducible software vs. reproducible research. link
- Todd Rosenquist and Shane Story, Using the Intel Math Kernal Library and Intel Compilers to obtain Numerical Run-to-run Reproducible Results. link original source
- Anthony Scopatz, Passive Reproducibility: It’s Not You, It’s Me. link
- Benjamin Seibold, Making reproducible computational research a reasonable choice for young faculty on tenure track. link
Slides from 5-Minute Lightning Talks
Wednesday
- Noah Clemons, "How to Enforce Reproducibility with your Existing MKL Code" .pptx
- Neil Chue Hong, "The Foundations of Digital Research" .pdf
- David Ketcheson, online demo link
- Nicolas Limare, "My Christmas List for Reproducibility" .pdf
- Sebastien Li-Thiao-Te, "Lepton : Literate Executable Papers" .pdf
- Benjamin Seibold, .pdf
- Matthias Troyer, "Publishing executable papers" .pdf
- Yihue Xie, "knitr: Starting From Reproducible Homework" .pdf
Thursday
- Adam Asare, "ITN TrialShare: Promoting reproducible research and transparency in clinical trials" .pptx
- Sara Billey, ""Canonical Representations of Theorems" .pptx
- David Koop, .key
- Sarah Michalek, "Silent Data Corruption and Other Anomalies" .pdf
- Ian Mitchell, "Reproducibility(?) Review Proposal" .pdf
- Geoffrey Oxberry, "Towards Turnkey Reproducibility" .pdf
- Bob Robey, "Enhanced Precision Sums for Parallel Computing Reproducibility" .pdf
- Michael Rubenstein, "The role of computation and data in my number theoretic work" .pdf
- Fernando Seabra Chirigati, .pptx
Breakout Group Summary Slides
Wednesday
- Tools Group link
- Funding Policy Group .pdf
- Journals/Publication Policy Group .pptx
- Numerical Reproducibility Group .pptx
Thursday
- Tools Group link
- Ontology and V&V Group .pptx
- Rewards/Culture Group .pptx
- Teaching Reproducibility Group link
Final Report
To appear.
References and Links Collected
Previous Workshops and Roundtables on Reproducible Research
- Applied Mathematics Perspectives 2011: Reproducible Research: Tools and Strategies for Scientific Computing
- AAAS 2011: The Digitization of Science: Reproducibility and Interdisciplinary Knowledge Transfer
- Community Forum on Reproducible Research, ICIAM 2011.
- Yale Law School: Data and Code Sharing Roundtable, including a link to the resulting Reproducible Research Declaration (pdf) and Contributed Thought Pieces.
Why Reproducibility is an Issue
- Roger Peng's blog post about Reproducible Research: A Dissenting Opinion by Chris Drummond.
Examples where Lack of Reproducibility Causes Problems
- Duke Trials: Reports
- [Jha2012] Alok Jha, “Tenfold increase in scientific research papers retracted for fraud,” U.K. Guardian, 1 Oct 2012, available at http://www.guardian.co.uk/science/2012/oct/01/tenfold-increase-science-paper-retracted-fraud.
- [Enserink2012] Martin Enserink, “Final report: Stapel affair points to bigger problems in social psychology,” Science, 28 Nov 2012, available at http://news.sciencemag.org/scienceinsider/2012/11/final-report-stapel-affair-point.html.
Notions of Reproducibility
A variety of terminology is used in connection with reproducible research. The Final Report contains as section on Terminology and below are some links related to some of these terms.
Reproducible/Replicabale/Auditable Research
Verification and Validation (V&V)
- What is verification and validation? Wikipedia arcticle
- Example paper following V&V: William J. Rider and Douglas B. Kothe, Reconstructing Volume Tracking, Journal of Computational Physics, Volume 141, Issue 2, 10 April 1998, Pages 112-152.
Uncertainty Quantification
Quantify the uncertainty in a computation Wikipedia
Identical Code Output
Code Archival
Policies on Data and Code Sharing
Funding Agency Policies
- Digital Research Data Sharing and Management Report from National Science Board Panel, March, 2011.
Journal Policies
- SHERPA RoMEO listing and classifying publisher copyright & self-archiving policies for a wide range of academic publishers.
- [IMU2010] IMU General Assembly, “Best Current Practices for Journals”, International Mathematical Union, available at: http://www.mathunion.org/fileadmin/CEIC/bestpractice/bpfinal.pdf
- Mathematical Programming Computation requiring code and data deposit as a condition of publication.
- American Economic Review, Data Submission Requirement.
- ACM Transactions on Mathematical Software Algorithms Policy.
- Science Magazine Policy on availability of data and code.
- Wouters et al., The Public Domain of Digital research Data, 2003. Includes a discussion of the controversial Journal of Cognitive Neuroscience's 2000 requirement of fMRI data submission.
- Gary King's Sample Replication Journal Policy.
- Geophysics source-code guidelines.
- IPOL Software Guidelines and Author Manual. This is cross-referenced from the SIAM Journal on Imaging Science link.
- Geoscientific Model Development model description papers require source code.
- The bioinformatic-journal/software hydrid, 2009.
- Biostatistics kite-marks papers satisfying reproducibility requirements.
Legal Issues and Frameworks
- V. Stodden, Enabling Reproducible Research: Licensing for Scientific Innovation, 2009.
- V. Stodden, The Legal Framework for Reproducible Scientific Research: Licensing and Copyright, Computing in Science and Engineering, vol. 11, no. 1, pp. 35-40, Jan./Feb., 2009.
- The OECD Working Group on Neuroinformatics Neuroscience Data and Tool Sharing: A Legal and Policy Framework for Neuroinformatics Neuroinformatics 1(2), 2003.
- [HIPAAref]: http://www.hhs.gov/ocr/privacy/index.html
Licenses and copyright, citation
- V. Stodden, Legal Attribution and Academic Citation: The Promise of Facilitated CC License Compliance, 2009.
- [Stodden09] "The Legal Framework for Reproducible Research in the Sciences: Licensing and Copyright", IEEE Computing in Science and Engineering, 11(1), January 2009, p.35-40.
- [Stodden12] "Software Patents as a Barrier to Scientific Transparency: An Unexpected Consequence of Bayh-Dole," With I. Reich, The Seventh Annual Conference on Empirical Legal Studies (CELS 2012), Stanford, CA. Nov, 2012.
- Data producers deserve citation credit, Nature Genetics, 2009.
- D. Donoho, How to be a Highly Cited Author in the Mathematical Sciences, 2002.
- H. Piwowar et al. Sharing Detailed Research Data Is Associated with Increased Citation Rate PLoS ONE 2007.
- [Hodges2011] Wilfrid Hodges, “CEIC Copyright Recommendations: What do You Want from Your Publisher?”, link
Open Science
- Nature Blog 2008: The Great Beyond: Stiglitz and Sulston: Who Owns Science?
- University of Manchester Press Release 2008: Nobel duo ask: "Who Owns Science?"
- Example of social math link
Archiving
- P. Bourne et al., Open access: taking full advantage of the content, 2008.
- B. McCullough, Do economics journal archives promote replicable research?, 2008.
- M. Gerstein et al., Structured digital abstract makes text mining easy, 2007.
- H. Gibbs, DISC-UK DataShare: State-of-the-Art Review, 2007.
- M. Seringhaus et al., Towards tomorrow's information architecture, 2007.
- M. Barinaga, Still Debated, Brain Image Archives Are Catching On, Science, 2003.
- H. Piwowar, Data Sharing.
- G. Steinhart, DataStaR: An Institutional Approach to Research Data Curation, 2007.
- G. Steinhart et al., Establishing Trust in a Chain of Preservation, D-Lib Magazine, 2009.
Influencing Policy and Changing the Culture
- Arzberger et al., An International Framework to Promote Access to Data, Science, 2004.
- Science Code Manifesto link
- Reproducible Research, Computing in Science and Engineering, 2010.
- Bertrand Meyer, Christine Choppy, Jørgen Staunstrup and Jan van Leeuwen: Research Evaluation for Computer Science, Communications of the ACM, vol. 52, no. 4, April 2009, pages 31-34
- [Meyer2009] Bertrand Meyer, Christine Choppy, Jørgen Staunstrup and Jan van Leeuwen, “Research evaluation for computer science,” Communications of the ACM, vol. 52, no. 4, April 2009, pages 31-34,http://dl.acm.org/citation.cfm?doid=1498765.1498780 available at http://medicina.unica.it/pacs/documenti/Research_Evaluation_CACM.pdf.
- [Patterson1999] David Patterson, Lawrence Snyder and Jeffrey Ullman, “Evaluating Computer Scientists and Engineers For Promotion and Tenure,” August, 1999, link.
Tools and Technologies
Version Control
Some version control systems (VCS) include:
Some public hosting cites for VCS repositories include
Workflow Management Systems
- Galaxy link
- Madagascar link
- Sumatra link
- Taverna link
- VisTrails link
- Trident link
- VCR link Verifiable Computational Research
- D. Koop, E. Santos, P. Mates, H. Vo, P. Bonnet, B. Bauer, B. Surer, M. Troyer, D. Williams, J. Tohline, J. Freire and C. Silva, A Provenance-Based Infrastructure to Support the Life Cycle of Executable Papers], In Proceedings of the International Conference on Computational Science, 2011. link
- J. Freire and C. Silva, Making Computations and Publications Reproducible with VisTrails, in Computing in Science and Engineering 14(4): 18-25, 2012.
Literate Programming Tools
Some literate programming tools include:
Notebooks/Publishing Tools
Some notebook/publishing tools include:
- IPython link
- knitr: An R package for dynamic report generation (a web notebook based on knitr)
- Maple link
- Mathematica link
- Matlab link
- RStudio link
- Sage link
- Open notebook science Wikipedia
- eCAT Electronic Lab Notebook.
Tools that capture and preserve a software environment
Package code along with complete environment (OS, compilers, graphics tools, etc)
Cloud Computing
Web platforms for running code
Integrated tools for version control and collaboration
Interactive theorem proving
- A Special Issue of AMS Notices on Formal Proof [1]
- Computer assisted theorem proving E.g.flyspeck
- Automated reasoning link
- Thomas C. Hales, “A proof of the Kepler conjecture,” Annals of Mathematics, vol. 162-3, pg. 1065-1185.
- NIST Digital Library of Mathematical Functions
Tools that can aid in reproducible research
These tools may be useful in conducting reproducible research.
- Matlab function that provides information about the CPU and operating system link
Numerical Reproducibility
- [Bailey2012] David H. Bailey, Roberto Barrio, and Jonathan M. Borwein, “High precision computation: Mathematical physics and dynamics,” Applied Mathematics and Computation, vol. 218 (2012), pg. 10106-10121.
- [Bailey1992] David H. Bailey, “Misleading performance reporting in the supercomputing field,” Scientific Programming, vol. 1 (Winter 1992), pg. 141-151.
Parallel Computing Issues
- [Borkar2012] Borkar, S. (2012) “Exascale Challenges, Why Resiliency?” talk presented at the Inter-Agency Workshop on HPC Resilience at Extreme Scale, Feburary 21, 2012.
- [Constantinescu2000] Constantinescu, C. (2000) “Teraflops supercomputer: Architecture and validation of the fault tolerance mechanisms” IEEE Transactions on Computers 49:886-894.
- [DarpaResilience2009] (Mootaz) Elnozahy (editor), “System Resilience at Extreme Scale”, available at: http://institute.lanl.gov/resilience/docs/IBM%20Mootaz%20White%20Paper%20System%20Resilience.pdf
- [HECResilience2009] Nathan DeBardeleben, et. al, “High‐End Computing Resilience: Analysis of Issues Facing the HEC Community and Path‐Forward for Research and Development”, available at: http://institute.lanl.gov/resilience/docs/HECResilience_WhitePaper_Jan2010_final.pdf
- [InterAgency2012] John Daly, et. al, “Inter-Agency Workshop on HPC Resilience at Extreme Scale”, available at: http://institute.lanl.gov/resilience/docs/Inter-AgencyResilienceReport.pdf
- [Kola2005] Kola, G., Kosar, T. and M. Livey (2005) “Faults in large distributed systems and what we can do about them” Proceedings of the 11th European Conference on Parallel Processing (Euro-Par 2005).
- [Robey2011] Robey, R., Robey, J., and Aulwes, R., “In Search of Numerical Consistency in Parallel Computing”, Vol. 37, Issue 1, Jan 2011
- [TowardsExascaleResilience2009] Franck Cappello, et. al, “Towards Exascale Resilience”, International Journal of High Performance Computing Applications, Vol 23, Issue 4, Nov 2009, pp 374-388.
Silent Data Corruption
- [Autran2010] Autran, JL, Munteanu, D., Roche, P. , Gasiot, G., Martinie, S., Uznanski, S., Sauze, S., Semikh, S., Yakushev, E., Rozov, S. et al. (2010) “Soft-errors induced by terrestrial neutrons and natural alpha-particle emitters in advanced memory circuits at ground level” Microelectronics Reliability 50: 1822-1831.
- [Li2010] Li, X., Huang, M.C.. Shen, K. and L. Chu (2010) “A realistic evaluation of memory hardware errors and software system susceptibility” Proceedings of the 2010 USENIX conference on USENIX annual technical conference.
- [Michalak2010] Sarah Michalak (2010) “Soft Errors, Silent Data Corruption, and Exascale Computing,” invited talk at the Resilience Summit 2010, available: http://www.csm.ornl.gov/srt/conferences/ResilienceSummit/2010/pdf/michalak.pdf
- [Michalak2012] Sarah Michalak, Andrew DuBois, Curtis Storlie, Heather Quinn, William Rust, David DuBois, David Modl, Andrea Manuzzato and Sean Blanchard (2012) ``Assessment of the Impact of Cosmic-Ray-Induced Neutrons on Hardware in the Roadrunner Supercomputer, IEEE Transactions on Device and Materials Reliability 12:2, 445-454.
- [Constantinescu2005] Constantinescu, C. (2005) “Dependability Benchmarking Using Environmental Test Tools,” Proceedings of the 2005 Reliability and Maintainability Symposium 567-571.
Experimental Mathematics
- The Computer As Crucible: An Introduction to Experimental Mathematics, Jonathan Borwein and Keith Devlin. link
- D.H. Bailey and J. M. Borwein, ``Exploratory Experimentation and Computation." Notices of the AMS. 58 (10) (2011), 1410-1419.
- J.M. Borwein and R.E. Crandall, ``Closed forms: what they are and why we care." Notices Amer. Math. Soc. 60:1 (2013), xxx-xxx.
- Jonathan Borwein and Veselin Jungic, ``Organic Mathematics then and now." Notices of the AMS, 59 (2012), 416-419.
- Jonathan Borwein, Peter Borwein and Veselin Jungic, ``Remote Collaboration: Six Years of the Coast-To-Coast Seminar Series." Science Communication, 34 (3) (2012), 419-428.
- [Borwein2008] Jonathan M. Borwein and David H. Bailey, Mathematics by Experiment: Plausible Reasoning in the 21st Century, A K Peters, Natick, MA, 2008.
Education, Courses, and Training
Regular courses teaching some aspects of reproducibility
- Reproducible Research & Software Carpentry at UBC taught by Ian Mitchell and Dhavide Aruliah
- UT-Austin course (need link)
- CS 291 at KAUST (need link)
- High performance scientific computing course at Univ. of Washington taught by Randy LeVeque
- How to Write a Publishable Paper as a Class Project by Gary King
Short courses and summer schools
- Winter School in eScience on Reproducible Science And Modern Scientific Software, Geilo, Norway, 2013.
On-line tutorials and other sources
- Best Practices for Scientific Computing, arXiv.org, 2012.
- Workflows for Reproducible Research in Computational Neuroscience, Andrew Davison.
Other Readings and Publications
- N. Barnes, Publish your computer code: it is good enough, Nature 467 (2010) p. 753. link
- Z. Merali, Computational science: ...Error Why scientific programming does not compute. Nature 467(2010), pp. 775-777. link
- K. A. Baggerly and D. A. Berry, Reproducible Research, AMSTAT NEWS, Jan. 1, 2011 link
- A. Jogalekar, Computational research in the era of open access: Standards and best practices, Scientific American (2013) link
- A. Morin, J. Urban, P.D. Adams, I. Foster, A. Sali, D. Baker, and P. Sliz, Shining Light into Black Boxes, Science 336 (2012) link.
- Juliana Freire, Philippe Bonnet and Dennis Shasha, Computational reproducibility: state-of-the-art, challenges, and database research opportunities, in Proceedings of SIGMOD, 593-596 (2012).
- R. Peng, Reproducible Research and Biostatistics, Biostatistics, November 2009.
- S. Fomel and J. Claerbout, "Reproducible Research", Guest Editors' Introduction to a Special Issue of CiSE. link
- D. Donoho et al. Reproducible Research in Computational Harmonic Analysis, 2009. (pdf).
- I. Manolescu et al., Repeatability & Workability Evaluation of SIGMOD 2009, 2009.
- Philippe Bonnet, et al., Repeatability and workability evaluation of SIGMOD 2011, SIGMOD Record, 40, Issue 2 (June 2011), pp. 45-48.
- P. Vandewalle, J. Kovačević, and M. Vetterli, Reproducible Research in Signal Processing - What, Why, and How, 2009.
- J. J. Quirk, Computational Science "Same old silence, same old mistakes", something more is needed. link
- B. McCullough, Got Replicability? The Journal of Money, Credit and Banking Archive, 2007.
- B. McCullough, McGeary, and Harrison, Lessons from the JMCB Archive, 2006.
- MATLAB Blog 2009: Reproducible Research in Signal Processing.
- R. LeVeque, Wave propagation software, computational science, and reproducible research, 2006.
- J. Ioannidis, Why Most Published Research Findings Are False, 2005.
- R. Anderson et al. The Role of Data & Program Code Archives in the Future of Economic Research, 2005.
- A. Rossini, F. Leisch, Literate Statistical Practice, 2001.
- D. Donoho, J. Buckheit, WaveLab and reproducible research, 1995.
- G. King, Replication, Replication, 1995.
- K. Price, Anything You Can Do, I Can Do Better (No You Can’t)..., 1986.
- M. Schwab, N Karrenbach and J Claerbout. Making scientific computations reproducible Computing in Science & Engineering, 2000.
- P. Schofield et al. Post-publication sharing of data and tools Nature 461, 10 September 2009.
- WaveLab, reproducible research in wavelets;
- Gentleman et al., Bioconductor: Open Software Development for Computational Biology and Bioinformatics, 2004.
- EPFL Audiovisual Communications Lab page containing reproducible papers.
- SparseLab, reproducible research in sparse modeling and compressed sensing;
- R. LeVeque, Python Tools for Reproducible Research on Hyperbolic Problems, Computing in Science and Engineering, vol. 11, no. 1, pp. 19-27, Jan./Feb., 2009. [link with tools http://www.amath.washington.edu/~rjl/pubs/cise09].
- C. Savage and A. Vickers, Empirical Study of Data Sharing by Authors Publishing in PLoS Journals, 2009. (Some reactions to the article).
- J. Ioannidis et al., Repeatability of Published Microarray Gene Expression Analyses, 2008.
- K. Baggerly and K. Coombes, Deriving Chemosensitivity from Cell Lines: Forensic Bioinformatics and Reproducible Research in High-Throughput Biology, 2009.
- G. Baiocchi Reproducible Research in Computational Economics: Guidelines, Integrated Approaches, and Open Source Software, 2007.
- Patterns of information use and exchange: case studies of researchers in the life sciences, Nov 2009.
- Alzheimer’s Disease Neuroimaging Initiative (ADNI) Data Website, and their data sharing and publication policy;
- Integrating with integrity, Nature Genetics, 2010.
- Sulston's 2002 account of the story of the 1996 Bermuda Declaration: "Heritage of Humanity".
- The Toronto Statement, 2009.
- E. Marshall, Bermuda Rules: Community Spirit, With Teeth, Science, 2001.
- [King2006] King, G. “Publication, Publication”. PS: Political Science and Politics, Vol. XXXIX, No. 1 (January, 2006), 119-125
- M. Hildebrandt, Profiling the European Citizen: Cross-disciplinary Perspectives, 2008.
- Brian Matthews, Brian McIlwrath, David Giaretta, Esther Conway, The Significant Properties of Software: A Study, JISC Report, 2008.
- [Hager2012] Georg Hager, “Fooling the masses – Stunt 9: Boast massive speedups with accelerators!,” blog, 3 Nov 2012, available at http://blogs.fau.de/hager/category/fooling-the-masses.
- [Panzer-Steindel2007] Panzer-Steindel, B. (2007) “Data integrity,” available: http://indico.cern.ch/getFile.py/access?contribId=3\&sessionId=0\&resId=1\&materialId=paper\&confId=13797.