ICERM Reproducibility in Computational and Experimental Mathematics: Readings and References

From StoddenWiki
Jump to: navigation, search

This page collects useful references for the ICERM workshop Reproducibility in Computational and Experimental Mathematics.

Short link: http://icerm.brown.edu/tw12-5-rcem-wiki.php, http://is.gd/RRlinks, or http://goo.gl/QbDOx.

Workshop Report: http://stodden.net/icerm_report.pdf

Contents

Materials from the ICERM Workshop

See also the abstracts posted on the workshop page... click on "Schedule and Supporting Material".

Thought Pieces Submitted by Participants

  • Randy LeVeque, Top Ten Reasons to Not Share Your Code (and why you should anyway). link
  • Nicolas Limare, Running a Reproducible Research Journal, with Source Code Inside.link
  • Sébastien Li-Thiao-Té, Literate Research versus Reproducible Research. link
  • Ursula Martin, The social machine of mathematics. link
  • Fernando Perez, Reproducible software vs. reproducible research. link
  • Todd Rosenquist and Shane Story, Using the Intel Math Kernal Library and Intel Compilers to obtain Numerical Run-to-run Reproducible Results. link original source
  • Anthony Scopatz, Passive Reproducibility: It’s Not You, It’s Me. link
  • Benjamin Seibold, Making reproducible computational research a reasonable choice for young faculty on tenure track. link

Slides from 5-Minute Lightning Talks

Wednesday

  • Noah Clemons, "How to Enforce Reproducibility with your Existing MKL Code" .pptx
3 Types of Non-Reproducibility in Intel® Math Kernel Library
  • Run to Run - same processor
  • Runs between different Intel processors
  • Runs between different IA - compatible processors
Conditional Numerical Reproducibility link
  • Neil Chue Hong, "The Foundations of Digital Research" .pdf
The Foundations of Digital Research link
  • Research
  • Careers
  • Recognition/Reward
  • Skills and Capability
Isn't software just data?
  • Journal of Open Research Software link
  • Role of repositories link
  • Publication for Discovery
Five stars of software
  • Existence - there is accurate metadata that defines the software
  • Availability - you can access and run the software
  • Openness - the software has an open permissible license
  • Linked - the related data, dependencies and papers are indicated
  • Assured - the software provides ways of assuring its "correctness"
  • David Ketcheson, online demo link
  • Nicolas Limare, "My Christmas List for Reproducibility" .pdf
Tools and Infrastructure
  • software identifier
- DOI-like system for software, vendor-independent always pointing to the current location of the code
  • open data storage
- until every journal stores support materials
-more reliable then home page storage
Standards
  • cross-library APIs
- We need to be able to replace a broken library by another
-need an interface and spec common to multiple implementations
  • programming languages specs
- programming language defined by the current implementation will eventually break
- need a formal and stable spec of the language
  • software quality test tools and services
- guide code authors to improvements
- reduce the workload for referees
Copyright and patents
  • paper vs. software
-paper for human, software for computer
-should have the same copyright status
  • no patent restriction for research
- patent system is supposed to stimulate innovation
- implementation is the translation of information already public in the patent application
- should not be prevented from releasing the code
  • Sebastien Li-Thiao-Te, "Lepton : Literate Executable Papers" .pdf
Lepton: Literate Executable Papers link
  • Everyday tasks such as programming and writing technical reports
  • Reviewing the methods and results by collaborators and in the long term
  • Re-using source code, input data, research results
Conference papers
  • Sébastien Li-Thiao-Té. Literate program execution for reproducible research and executable papers. Procedia Computer Science, 9(0):439 – 448, 2012. ICCS 2012.
  • Sébastien Li-Thiao-Té. Literate program execution for teaching computational science. Procedia Computer Science, 9(0):1723 – 1732, 2012. ICCS 2012.
Lepton Provides:
  • provenance information
-generated documents contain all the information required to reproduce the results
  • executable papers
-a Lepton file is a program and can be executed on the local machine
  • coherence and correctness guarantess
-Lepton executes commands and automatically embeds their output
-no copy and paste
  • literate programming features
-everything in the same bundle
-related items placed in close proximity
-meaningful chunks
  • generated, up-to-date documentation
-run benchmarks with scripts in any language
-format the results into tables
  • Benjamin Seibold, .pdf
  • Matthias Troyer, "Publishing executable papers" .pdf
Numerical experiments + theorem and proof
  • Can we build quantum computers based on non-unitary conformal field theories
  • First reproducible numerical experiment, then theorem and proof.
An executable paper
  • Clicking on the figure downloads the VisTrails workflow that reproduces the figure
Publishing requires compromises
  • No stable URL or DOI for supplementary material
  • No link form the figure, but only a reference
Intermediate solution
  • Publish raw data and workflows through our institutional library and obtain DOIs
  • Refer to that data from the paper and just include a backup copy with the papers
  • Yihue Xie, "knitr: Starting From Reproducible Homework" .pdf
knitr
  • Wikipedia article link
  • Offical page link
Reproducible homework examples link
Example of power and imagination of students link
If reproducible homework comes, can reproducible research be far behind?

Thursday

  • Lorena Barba, "Reproducibility PI Manifesto" .pdf figshare
(1) I will teach my graduate students about reproducibility
a) Lab notebook
b) version control
c) workflow
d) publication-quality plots at group meetings
(2) All our research code (and writing) is under version control
a) Local svn repo for prototypes on Python/Matlab/CUDA C and for LaTeX documents (reports, manuscripts, et al.)
b) Google code for released research codes
c) Bitbucket or Github for collaborative projects
(3) We will always carry out verification and validation
V&V reports are posted to figshare. Example: Validation of the cuIBM code for Navier-Stokes equations with immersed boundary methods. Anush Krishnan, Lorena A. Barba. figshare. Retrieved 18:16, Dec 12, 2012 (GMT) http://dx.doi.org/10.6084/m9.figshare.92789
(4) For main results in a paper, we will share data, plotting script & figure under CC-BY
Posted to figshare Get a DOI and use in the paper under CC-BY, with citation.
Example: Weak scaling of parallel FMM vs. FFT up to 4096 processes. Lorena A. Barba, Rio Yokota. figshare. Retrieved 18:23, Dec 12, 2012 (GMT) http://dx.doi.org/10.6084/m9.figshare.92425
(5) We will upload the preprint to arXiv at the time of submission of a paper.
... & update the preprint post peer-review.
(6) We will release code at the time of submission of a paper under MIT license.
As preparatory measure: I will declare this intention in grant proposals.
I have endorsed the Science Code Manifesto. http://sciencecodemanifesto.org
(7) We will add a “Reproducibility” declaration at the end of each paper.
(8) I will keep an up-to-date web presence.
Corollary: I will develop a consistent open science policy.
Three themes
1. New publication models
2. Workflow standards
3. Social dynamic
  • Adam Asare, "ITN TrialShare: Promoting reproducible research and transparency in clinical trials" .pptx
Clinical trials consortium whose mission is to accelerate the development of immune tolerance therapies
  • Therapeutic areas include: allergy/asthma, transportation, and autoimmune diseases
  • Over 50 Phase I/II clinical trials
  • NIH/NIAID funded over 12 years, ~30M/year
  • Numerous academic and industry partners
Core objectives: Innovation and Collaboration
  • Develop novel assay technologies
  • Support rigorous and reproducible research
  • Adhere to standardized platforms and processes
  • Sara Billey, ""Canonical Representations of Theorems" .pptx
  • Sarah Michalek, "Silent Data Corruption and Other Anomalies" .pdf
Silent Data Corruption
  • "SDC occurs when incorrect data is delivered by a computing system to the user without any error being logged" C. Constantinescu (AMD)
  • SDC can have many causes [Constantinescu 2008]
- Environmental (temperature/voltage fluctuations; particles), manufacturing residues, oxide breakdown, electro-static discharge
  • New technologies may experience more data corruptions [Borkar 2012]
- Increasing number of bits/latches, noise levels; decreasing voltages
  • Can affect desktops/laptops and larger-scale systems
  • Greater issue with increasing scale
- ECC/parity provide protection, but perfect systems can’t be assumed
  • Impact is hard to quantify precisely: SDC is silent and rare
- Different applications may have different susceptibility
NL SDC Research
  • Platform Testing:
- 9 platforms tested: all with HPL; 5 with HPL and Crisscross interconnect tester
- HPL SDC observed on 4 platforms post-decomissioning
  • Laboratory Testing:
- Dual-core 65nm processor on a high-performance overclocking MOBO— Manipulate frequency, voltage, fan speed/temp while running Linpack
- Multiple forms of SDC outside of nominal conditions
- Incorrect Linpack results, erroneous timestamps/environmental data
- Other Errors: program termination with no error, program crash, system crash
  • Neutron Beam Testing of Roadrunner (1st petaflop/s supercomputer):
- 2 SDCs observed on Cell processor; 2 SDCs observed on Opteron processor
- Estimate 1 cosmic-ray-neutron-induced SDC every 1.5 months of operation
Other Anomalies
  • Data parsing (grep?) issue: different counts of failed HPL calculations
- 5 false positives in 10M lines parsed; rate replicated on second cluster
- Appears related to a binary failing to link to a required dynamic library— NOT silent if you capture all messages; how many people do this?
  • R duplicated lines reading in file using read.table command
Computational Reproducibility Must Account for SDC and Other Anomalies
  • Ian Mitchell, "Reproducibility(?) Review Proposal" .pdf
Reproducibility(?) Review Proposal
  • Proposal to evaluate reproducibility of submissions to an annual CS / engineering conference
- Conference is ACM sponsored & published
- Accepts 25 to 30 papers / year, perhaps 70% include a computational element
- Program committee has expressed support
  • Inspired by SIGMOD “repeatability & workability” evaluation procedures
- Bonnet et al, SIGMOD Record, DOI: 10.1145/2034863.2034873
  • This conference has much more homogenous computational efforts than SIGMOD
- Typically a handful of plots generated by a few hundred lines of Matlab that runs in a few hours on a laptop
Procedure
  • Repeatability Evaluation Committee (REC)
- Get recommendations for postdocs / senior grad students from members of the program committee (PC)
- Papers go through normal PC review process
- Authors of accepted papers are invited to submit a repeatability package (RP) at time of final paper submission
- Authors and REC are provided evaluation criteria in advance
  • RP contains a document, software and data
- Document explains what elements of the paper are repeatable, system requirements and a procedure for installation, execution and extraction of results
- Software can be provided by: link to public repository, archive file, VM, AMI, runmycode.org, …?
Evaluation Criteria
  • Three criteria, each rated 0-4
- “Repeatable” if average score of 2, all scores > 0
- Not clear how to combine scores from different reviewers
- Not clear what elements of the reviews should be public
- If not repeatable, no effect on the paper
- If repeatable, the instruction document must be included in ACM DL supplemental material, small software and data could be included
  • Criteria 1: Coverage
- 0: no computational elements are repeatable
- 1: at least one repeatable element
- 2: majority of elements are repeatable
- 3: all repeatable and/or most extensible
- 4: all extensible
Evaluation Criteria
  • Criteria 2: Instructions
- 0: none included
- 1: installation instructions but little else
- 2: for every computational element that is repeatable there is a specific instruction explaining how to repeat it
- 3: there is a single command that almost exactly recreates each repeatable element
- 4: additional explanations of design decisions, extensions, …
  • Criteria 3: Quality(?)
- 0: No evidence of documentation or testing
- 1: The purpose of almost all files is documented
- 2: Almost all elements within source code and all data file formats are documented
- 3: At least some components of the code have some testing
- 4: Significant unit and system test coverage
  • Geoffrey Oxberry, "Towards Turnkey Reproducibility" .pdf
Problem: Reproducing someone's work can be hard.
  • Need to install necessary software (assume open source)
- Takes time, expertise, patience, privileges
- Could affect system stability
  • Could wrap source in VM (virtual machine) image
- Usually requires 300 MB to host; big, unwieldy
- No separation of source code & environment means no flexibility
  • High barrier means people don't run the code
  • Lower hosting & time barrier by specifying environment in separate repo using configuration management software
Solution: Specify environment with configuration management software
  • Config management tools specify config in text files
- Shell scripts (simplest, fewest prepackaged features)
- Puppet (puppetlabs.com)
- Chef (wiki.opscode.com)
- Fabric (http://docs.fabfile.org/en/1.5/)
- Related: Hashdist, Blueprint, Reprozip, others...
  • Instantiate config using virtualization tools
- Serial, small parallel jobs: Vagrant (vagrantup.com) + VirtualBox (virtualbox.org)
- StarCluster (star.mit.edu/cluster)
- CloudFormation (aws.amazon.com/cloudformation)
- Any other virtualization software + hardware
- Use web services instead (like Wakari, RunMyCode)
  • Idea is flexibility: pick & choose (even none, mix)
Example: Install Python interface for DASSL
  • DASSL: differential-algebraic equation solver package in Fortran (L. Petzold)
  • PyDAS: Python interface to DASSL (J. W. Allen, on GitHub)
  • Example: Solve Robertson problem in IPython notebook using PyDAS
  • Presentation, environment and source repos on https://github.com/goxberry (all labeled with ICERM-2012)
- Requires Vagrant + VirtualBox
- Vagrantfile to specify VM to create (here, Ubuntu 12.04)
- Configuration in Puppet
- README with directions for running software
  • Bob Robey, "Enhanced Precision Sums for Parallel Computing Reproducibility" .pdf
EPSum-­‐The Problem
  • Different results when varying no. of processors
  • Order of operations change with number of processors
  • Precision errors same order of magnitude as programming errors
EPSum–Kahan Sum
  • Add carry digits through a second double precision number
EPSum–The Code
  • In practice, the Kahan sum produced as good a result as the Knuth sum at lower cost
  • Michael Rubenstein, "The role of computation and data in my number theoretic work" .pdf
Computation in number theory can:
  • contribute to rigorous mathematical proof.
  • help verify conjectures experimentally.
  • stimulate mathematical discovery, theorems, relationships, conjectures, and uncover phenomena.
  • motivate work on algorithms and complexity (exs: factoring, primality testing, Ghaith Hiary’s algorithm for computing the zeta function).
  • involve automated theorem proving and proof verification.
What does it mean for a computation to be rigorous? Even assuming that the algorithms/methods are correct in principle, how does one certify, potentially, thousands of lines of code as being correct when there are many places where things can go wrong:
  • mathematical coding errors (ex: signs, branches of log).
  • programming errors (ex: wrong loop or array bounds, memory issues, confusing types).
  • loss of precision (accumulated round-off, cancellation), wrong error analysis/inequalities.
  • reliance on black boxes: computer chips (Intel division bug), compilers and optimizers (constantly releasing new versions with bug fixes), computer memory (can get corrupted), closed source software (Mathematica), other people’s packages, and using them in ways that were not originally intended or foreseen.
At present we usually declare a program to be bug free once it produces output that is consistent with our expectations. We tend to give more trust to computational results that are reproducible using separate code and hardware, or for which there is more than one method that gives the same output, or for which the correctness of the result can be easily tested (examples: factorization of an integer, explicit formula for zeros of an L-function).


  • Fernando Seabra Chirigati, .pptx
ReproZip: Packing Experiments for Sharing and Publication
Motivation
  • Published articles are not made reproducible
  • Computational reproducibility may be difficult to achieve
ReproZip is a packaging solution
  • It makes it easier for authors to pack experiments and for reviewers to verify computational results
It creates reproducible packages from existing experiments on computational environment E
  • No need to port experiments to other system
  • Leverages provenance of computational results
It unpacks an experiment on computational environment E’
It generates a workflow specification that encapsulates the execution of the experiment

Breakout Group Summary Slides

Wednesday

  • Funding Policy Group .pdf
Funding Issues
  • Separate funding for reproducibility -- baked into current practice?
  • Supply templates for data management plan to include open and available software
  • Encourage NSF etc. to list reproducibility as specific examples of "broader impact to research"
Public policy
  • How to demonstrate that reproducibility is a 'more productive way to do science' -- Dave Donoho
  • Principled approaches to programming (early career training workshops - e.g. project NExT)
  • "set the default to open" statements (from professional organizations)
Publicity
  • Compilation of positive 'stories' in support of reproducibility
  • Difference between building a telescope and making data/code open and available
  • Is open source the basis of open science?
  • Journals/Publication Policy Group .pptx
Recommend that we put forth a set of best practices for what authors should do
  • Set of procedures for authors, referees, and editors
  • Put forth a rubric for rating papers that all can use
- Individual journals could adapt this as appropriate
  • Could pull examples from sources where reproducibility is encouraged currently
- SIGMOD
- IPOL
What would best practices include?
  • VM (full supplied by authors, reference VMs and partial VMs)
- Pros: Can execute anywhere
- Cons: Big (IPOL does not allow VMs because of this, reference and partial will make these submissions smaller), proprietary software is an issue
  • All source codes
- Pros: All source present
- Cons: Hard to install and run in general (can specify compilers and make procedures to help with this)
  • Code excerpts with implementations of relevant algorithms (ETH requires this, Science requires this-has retracted papers for this)
How to introduce this expectation into the published literature
  • Invite accepted papers to submit to a reproducibility review
- If not reproducible as submitted, ask for more information to bring them up to the standard
  • Develop a special issue where papers undergo a reproducibility review (like an editors' choice issue)
  • Overlays for arXiv, other archives or journals (like a certification webpage) with links to pointers to "certified" papers
  • Supplementary journals like SIAM Imaging Science and IPOL
  • Certifying journals on sites like Romeo or ISI Thompson
  • Numerical Reproducibility Group .pptx

Thursday

  • Ontology and V&V Group .pptx
"We have a habit in writing articles published in scientific journals to make the work as finished as possible, to cover up all the tracks, to not worry about the blind alleys or describe how you had the wrong idea first. . . . So there isn't any place to publish, in a dignified manner, what you actually did in order to get to do the work." — Richard Feynman (Nobel acceptance 1966)
Taxonomy of types of reproducibility
  • Science that can be independently replicated and validated by the community
  • Given a code, you turn a crank, but are you independently reproducing the result? Levels of reproducibility
  • There is a cultural environment on how we report results
  • Publish or perish puts pressure on publishing
Hierarchy of levels of reproducibility
  • Hierarchy of levels of claims
  • Referees and editors can expect authors have done due diligence in verifying claims
  • Levels of reproducibility
  • Algorithm without details verse use package (with version)
Outcome
  • Field is beyond reproach, “High Integrity by Design”
  • Examples:
-Japanese research on stem cells
-Performance claims of particular algorithms on parallel computers
-Failure to do so will impede progress in a field
-Establish and deserve trust of public
-Develop chain of trust
Verification
  • Verification: Provide evidence that the implementation of the mathematical equations
-Verification is a prerequisite to reproducibility
-Any code user should cite or provide independent verification
-Verification is never complete: the level of verification should be commensurate with intended use of the code
Validation/UQ
  • Identify the sources of uncertainties
  • Rewards/Culture Group .pptx
Why?
  • Reproducibility and computational mathematical research makes your life better because:
- better science
-generates collaboration
-increased citations
- your reseach group will become more productive
How?
  • Authors of code and data should make it as publically available as possible for reproducibility/verifiability,
-subject to the rules and regulations of their funding agency or employer reserving rights for the author(s)/owner(s), copyright, patent rights, national security, medical privacy, etc.
  • Some guidelines on publishing and/or disclosing software should be given in a ``How to format.
-The first step is to put your name on all your publically available products.
  • Conferences, society, special edition of a journal, etc could give an award based on a software contribution to the community. A small monetary contrubtion from their own funds would be a good extra step but not entirely necessary.
  • Grant proposals should be better written to get funded. Software must be identified as part of the research activity and research infrastructure activity.
  • Teaching Reproducibility Group link
A Course in Reproducible Research
  • Some content is generic, some is domain specific
  • Generic content must be illustrated by domain specific examples
  • Meta issues
-Where should these topics be taught? Single generic course is unlikely to gain traction (at least in established institutions), so piecemeal inclusion into other computationally oriented courses is more likely to work
-How can we share course material that is developed? Creative commons (but be careful about NC restrictions)
-Examples: Software Carpentry bootcamps, reproducible science winter school in Geilo Norway, HPC course at University of Washington, computational science course at KAUST, others?
Hands on topics
  • Software Carpentry
-version control
-scripting
-databases
-build systems
-unit testing
  • Testing
-system / regression
-V&V / UQ
-continuous integration
-Provenance
-Reproducibility in statistical and probabilistic computations
  • Research documentation
-lab notebook
-research compendium
-literate programming
  • Programming (not in SWC)
-debuggers
-how to write good code
-floating point / nondeterminism
-documentation
  • Big challenges
-high performance computing
-big data
-cloud computing
-complicated SW stacks / toolchains (solution is VMs)
Lecture topics
  • IP / licensing
  • Citation / attribution
  • examples of RR (both good & bad; ideally domain specific)
  • publishing / repositories / archives
  • generic scientific software requirements

Final Report

To appear.

References and Links Collected

Previous Workshops and Roundtables on Reproducible Research

Why Reproducibility is an Issue

Examples where Lack of Reproducibility Causes Problems

Notions of Reproducibility

A variety of terminology is used in connection with reproducible research. The Final Report contains as section on Terminology and below are some links related to some of these terms.

Reproducible/Replicabale/Auditable Research

Verification and Validation (V&V)

  • Example paper following V&V: William J. Rider and Douglas B. Kothe, Reconstructing Volume Tracking, Journal of Computational Physics, Volume 141, Issue 2, 10 April 1998, Pages 112-152.

Uncertainty Quantification

Quantify the uncertainty in a computation Wikipedia

Identical Code Output

Code Archival

Policies on Data and Code Sharing

Funding Agency Policies

Journal Policies

Legal Issues and Frameworks

Licenses and copyright, citation

  • [Stodden09] "The Legal Framework for Reproducible Research in the Sciences: Licensing and Copyright", IEEE Computing in Science and Engineering, 11(1), January 2009, p.35-40.

  • [Stodden12] "Software Patents as a Barrier to Scientific Transparency: An Unexpected Consequence of Bayh-Dole," With I. Reich, The Seventh Annual Conference on Empirical Legal Studies (CELS 2012), Stanford, CA. Nov, 2012.
  • [Hodges2011] Wilfrid Hodges, “CEIC Copyright Recommendations: What do You Want from Your Publisher?”, link

Open Science

  • Example of social math link

Archiving

Influencing Policy and Changing the Culture

  • Science Code Manifesto link
  • [Patterson1999] David Patterson, Lawrence Snyder and Jeffrey Ullman, “Evaluating Computer Scientists and Engineers For Promotion and Tenure,” August, 1999, link.

Tools and Technologies

Version Control

Some version control systems (VCS) include:

Some public hosting cites for VCS repositories include

Workflow Management Systems

  • D. Koop, E. Santos, P. Mates, H. Vo, P. Bonnet, B. Bauer, B. Surer, M. Troyer, D. Williams, J. Tohline, J. Freire and C. Silva, A Provenance-Based Infrastructure to Support the Life Cycle of Executable Papers], In Proceedings of the International Conference on Computational Science, 2011. link

Literate Programming Tools

Some literate programming tools include:

Notebooks/Publishing Tools

Some notebook/publishing tools include:

Tools that capture and preserve a software environment

Package code along with complete environment (OS, compilers, graphics tools, etc)

Cloud Computing

Web platforms for running code

Integrated tools for version control and collaboration

Interactive theorem proving

Tools that can aid in reproducible research

These tools may be useful in conducting reproducible research.

  • Matlab function that provides information about the CPU and operating system link

Numerical Reproducibility

  • [Bailey2012] David H. Bailey, Roberto Barrio, and Jonathan M. Borwein, “High precision computation: Mathematical physics and dynamics,” Applied Mathematics and Computation, vol. 218 (2012), pg. 10106-10121.
  • [Bailey1992] David H. Bailey, “Misleading performance reporting in the supercomputing field,” Scientific Programming, vol. 1 (Winter 1992), pg. 141-151.

Parallel Computing Issues

  • [Borkar2012] Borkar, S. (2012) “Exascale Challenges, Why Resiliency?” talk presented at the Inter-Agency Workshop on HPC Resilience at Extreme Scale, Feburary 21, 2012.
  • [Constantinescu2000] Constantinescu, C. (2000) “Teraflops supercomputer: Architecture and validation of the fault tolerance mechanisms” IEEE Transactions on Computers 49:886-894.
  • [Kola2005] Kola, G., Kosar, T. and M. Livey (2005) “Faults in large distributed systems and what we can do about them” Proceedings of the 11th European Conference on Parallel Processing (Euro-Par 2005).
  • [Robey2011] Robey, R., Robey, J., and Aulwes, R., “In Search of Numerical Consistency in Parallel Computing”, Vol. 37, Issue 1, Jan 2011
  • [TowardsExascaleResilience2009] Franck Cappello, et. al, “Towards Exascale Resilience”, International Journal of High Performance Computing Applications, Vol 23, Issue 4, Nov 2009, pp 374-388.

Silent Data Corruption

  • [Autran2010] Autran, JL, Munteanu, D., Roche, P. , Gasiot, G., Martinie, S., Uznanski, S., Sauze, S., Semikh, S., Yakushev, E., Rozov, S. et al. (2010) “Soft-errors induced by terrestrial neutrons and natural alpha-particle emitters in advanced memory circuits at ground level” Microelectronics Reliability 50: 1822-1831.
  • [Li2010] Li, X., Huang, M.C.. Shen, K. and L. Chu (2010) “A realistic evaluation of memory hardware errors and software system susceptibility” Proceedings of the 2010 USENIX conference on USENIX annual technical conference.
  • [Michalak2012] Sarah Michalak, Andrew DuBois, Curtis Storlie, Heather Quinn, William Rust, David DuBois, David Modl, Andrea Manuzzato and Sean Blanchard (2012) ``Assessment of the Impact of Cosmic-Ray-Induced Neutrons on Hardware in the Roadrunner Supercomputer, IEEE Transactions on Device and Materials Reliability 12:2, 445-454.
  • [Constantinescu2005] Constantinescu, C. (2005) “Dependability Benchmarking Using Environmental Test Tools,” Proceedings of the 2005 Reliability and Maintainability Symposium 567-571.

Experimental Mathematics

  • The Computer As Crucible: An Introduction to Experimental Mathematics, Jonathan Borwein and Keith Devlin. link
  • [Borwein2008] Jonathan M. Borwein and David H. Bailey, Mathematics by Experiment: Plausible Reasoning in the 21st Century, A K Peters, Natick, MA, 2008.

Education, Courses, and Training

Regular courses teaching some aspects of reproducibility

Short courses and summer schools

On-line tutorials and other sources


Other Readings and Publications

  • N. Barnes, Publish your computer code: it is good enough, Nature 467 (2010) p. 753. link
  • Z. Merali, Computational science: ...Error Why scientific programming does not compute. Nature 467(2010), pp. 775-777. link
  • K. A. Baggerly and D. A. Berry, Reproducible Research, AMSTAT NEWS, Jan. 1, 2011 link
  • A. Jogalekar, Computational research in the era of open access: Standards and best practices, Scientific American (2013) link
  • A. Morin, J. Urban, P.D. Adams, I. Foster, A. Sali, D. Baker, and P. Sliz, Shining Light into Black Boxes, Science 336 (2012) link.
  • S. Fomel and J. Claerbout, "Reproducible Research", Guest Editors' Introduction to a Special Issue of CiSE. link
  • Philippe Bonnet, et al., Repeatability and workability evaluation of SIGMOD 2011, SIGMOD Record, 40, Issue 2 (June 2011), pp. 45-48.
  • J. J. Quirk, Computational Science "Same old silence, same old mistakes", something more is needed. link
  • WaveLab, reproducible research in wavelets;
  • SparseLab, reproducible research in sparse modeling and compressed sensing;
  • [King2006] King, G. “Publication, Publication”. PS: Political Science and Politics, Vol. XXXIX, No. 1 (January, 2006), 119-125
Personal tools
Namespaces

Variants
Actions
Navigation
Tools