If these data could talk

Thomas Pasquier*, Matthew K. Lau, Ana Trisovic, Emery R. Boose, Ben Couturier, Mercè Crosas, Aaron M. Ellison, Valerie Gibson, Chris R. Jones, Margo Seltzer

*Corresponding author for this work

Research output: Contribution to journalReview article (Academic Journal)peer-review

31 Citations (Scopus)
248 Downloads (Pure)


In the last few decades, data-driven methods have come to dominate many fields of scientific inquiry. Open data and open-source software have enabled the rapid implementation of novel methods to manage and analyze the growing flood of data. However, it has become apparent that many scientific fields exhibit distressingly low rates of reproducibility. Although there are many dimensions to this issue, we believe that there is a lack of formalism used when describing end-to-end published results, from the data source to the analysis to the final published results. Even when authors do their best to make their research and data accessible, this lack of formalism reduces the clarity and efficiency of reporting, which contributes to issues of reproducibility. Data provenance aids both reproducibility through systematic and formal records of the relationships among data sources, processes, datasets, publications and researchers.

Original languageEnglish
Article number170114
Number of pages5
JournalScientific Data
Publication statusPublished - 5 Sept 2017


Dive into the research topics of 'If these data could talk'. Together they form a unique fingerprint.

Cite this