A Partial Remedy to the Reproducibility Problem

Several years ago, John Ionnidis jolted the scientific establishment with an article titled, “Why Most Published Research Findings Are False.” He had concerns about inattention to statistical power, multiple inference issues and so on. Most people had already been aware of all this, of course, but that conversation opened the floodgates, and many more issues were brought up, such as hidden lab-to-lab variability. In addition, there is the occasional revelation of outright fraud.

Many consider the field to be at a crisis point.

In the 2014 JSM, Phil Stark organized a last-minute session on the issue, including Marcia McNutt, former editor of Science and Yoav Benjamini of multiple inference methodology fame. The session attracted a standing-room-only crowd.

In this post, Reed Davis and I are releasing the prototype of an R package that we are writing, revisit, with the goal of partially remedying the statistical and data wrangling aspects of this problem. It is assumed that the authors of a study have supplied (possibly via carrots or sticks) not only the data but also the complete code for their analyses, from data cleaning up through formal statistical analysis.

There are two main aspects:

The package allows the user to “replay” the authors’ analysis, and most importantly, explore other alternate analyses that the authors may have overlooked. The various alternate analyses may be saved for sharing.
Warn of statistical errors, such as: overreliance on p-values; need for multiple inference procedures; possible distortion due to outliers; etc.

The term user here could refer to several different situations:

The various authors of a study, collaborating and trying different analyses during the course of the study.
Reviewers of a paper submitted for publication on the results of the study.
Fellow scientists who wish to delve further into the study after it is published.

The package has text and GUI versions. The latter is currently implemented as an RStudio add-in.

The package is on my GitHub site, and has a fairly extensive README file introducing the goals and usage.

5 thoughts on “A Partial Remedy to the Reproducibility Problem”

Pingback: Distilled News | Data Analytics & R

Sir
I salute you for taking up this can of worms. The article you mention (the name of the author is, btw, Ioannidis, not Ionnidis) has been troubling me since the publication. As any scientific work is based on earlier studies, and if those are basically at fault, well, the whole fabric of one’s own work should be considered with unpleasant suspicions. Not a welcome thought!

matloff says:

June 2, 2017 at 12:32 am

Thanks for the spelling correction, the second time I’ve mangled someone’s name in this blog. 😦

But yes, this is very, very serious.

Reply

Pingback: More on R and Reproducible Research | Mad (Data) Scientist

Pingback: Linkdump #42 | WZB Data Science Blog

	Anonymous on Just How Good Is ChatGPT in Da…
	Quantile Regression… on Quantile Regression with Rando…
	Anonymous on Quantile Regression with Rando…
	Sina Özdemir on qeML Example: Nonparametric Qu…
	Anonymous on qeML Example: Nonparametric Qu…

Mad (Data) Scientist

A Partial Remedy to the Reproducibility Problem

5 thoughts on “A Partial Remedy to the Reproducibility Problem”

Leave a comment Cancel reply

Musings, useful code etc. on R and data science

Share this:

Related

5 thoughts on “A Partial Remedy to the Reproducibility Problem”

Leave a comment Cancel reply

Musings, useful code etc. on R and data science