Wrong on an Astronomical Scale

I recently posted an update regarding our R package revisit, aimed at partially remedying the reproducibility crisis, both in the sense of (a) providing transparency to data analyses and (b) flagging possible statistical errors, including misuse of significance testing.

One person commented to me that it may not be important for the package to include warnings about significance testing. I replied that on the contrary, such problems are by far the most common in all of statistics. Today I found an especially egregious case in point, not only because of the errors themselves but even more so because of the shockingly high mathematical sophistication of the culprits.

This fiasco occurs in the article, “Gravitational Waves and Their Mathematics” in the August 2017 issue of the Notices of the AMS, by mathematics and physics professors Lydia Bieri, David Garfinkle and Nicolás Yunes. In describing the results of a dramatic experiment claimed to show the existence of gravitational wages, the authors state,

…the aLIGO detectors recorded the interference pattern associated with a gravitational wave produced in the merger of two black holes 1.3 billion light years away. The signal was so loud (relative to the level of the noise) that the probability that the recorded event was a gravitational wave was much larger than 5𝜎, meaning that the probability of a false alarm was much smaller than 10-7.

Of course, in that second sentence, the second half is (or at least reads as) the all-too-common error of interpreting a p-value as the probability that the null hypothesis is correct. But that first half (probability of a gravitational wage was much larger than 5𝜎) is quite an “innovation” in the World of Statistical Errors. Actually, it may be a challenge to incorporate a warning for this kind of error in revisit. 🙂

Keep in mind that the authors of this article were NOT the ones who conducted the experiments, nor were they even in collaboration with the study team. But I have seen such things a number of times in physics, and it is reminiscent of some controversy over the confirmation of the existence of the Higgs Boson; I actually may disagree there, but it again shows that, at the least, physicists should stop treating statistics as not worth the effort needed for useful insight.

In that light, this in-depth analysis by the experiments looks well worth reading.


Update on Our ‘revisit’ Package

On May 31, I made a post here about our R package revisit, which is designed to help remedy the reproducibility crisis in science. The intended user audience includes

  • reviewers of research manuscripts submitted for publication,
  • scientists who wish to confirm the results in a published paper, and explore alternate analyses, and
  • members of the original research team itself, while collaborating during the course of the research.

The package is documented mainly in the README file, but we now also have a paper on arXiv.org, which explains the reproducibility crisis in detail, and how our package addresses it. Reed Davis and I, the authors of the software, are joined in the paper by Prof. Laurel Beckett of the UC Davis Medical School, and Dr. Paul Thompson of Sanford Research.

Understanding Overhead Issues in Parallel Computation

In my talk at useR! earlier this month, I emphasized the fact that a major impediment to obtaining good speed from parallelizing an algorithm is systems overhead of various kinds, including:

  • Contention for memory/network.
  • Bandwidth limits — CPU/memory, CPU/network, CPU/GPU.
  • Cache coherency problems.
  • Contention for I/O ports.
  • OS and/or R limits on number of sockets (network connections).
  • Serialization.

During the Q&A at the end, one person in the audience asked how R programmers without a computer science background might acquire this information. A similar question was posed here today by a reader on this blog, to which I replied,

That question was asked in my talk. I answered by saying that I have an introduction to such things in my book, but that this is not enough. One builds this knowledge in haphazard ways, e.g. by search terms like “cache miss” and “network latency” on the Web, and above all, by giving it careful thought and reasoning things out. (When Nobel laureate Richard Feynman was a kid, someone said in awe, “He fixes radios by thinking!”)

Join an R Users Group, if there is one in your area. (And if not, then start one!) Talk about these things with them (though if you follow my above advice, you may find you soon know more than they do).

The book I was referring to was Parallel Computing for Data Science: With Examples in R, C++ and CUDA (Chapman & Hall/CRC, The R Series, Jun 4, 2015.

I have decided that the topic of system overhead issues in parallel computation is important enough for me to place Chapter 2 on the Web, which I have now done. Enjoy. I’d be happy to answer your questions (of a general nature, not on your specific code).

We are continuing to add more features to our R parallel computation package, partools. Watch this space for news!

By the way, the useR! 2017 videos are now on the Web, including my talk on parallel computing.

My Presentation at useR! 2017, Etc.

I gave a talk titled, “Parallel Computation in R:  What We Want, and How We (Might) Get It,” at last week’s useR! 2017 conference in Brussels. You can view my slides here, and I think the conference organizers said the videos would be placed online, not sure of that though.

The goal of the talk was to propose general design patterns for parallel computation in R, meaning general approaches that should be useful in many applications. I emphasized that this was just one person’s opinion, and expected the Spark fans to disagree with my view that Spark is not a very useful tool for useRs. Actually, several speakers in other talks were negative about Spark as well. One gentleman did try to defend Spark during the Q&A, but he talked to me afterward, and turned out not to be a huge Spark fan after all, largely just playing the devil’s advocate.

My examples of course involved partools, the package I’ve been developing for parallel computation in R. (Duncan Temple Lang’s PhD student Clark Fitzgerald is now involved in developing the package as well.) However, I noted that the same general principles could be applied with some other packages, such as ddR and multidplyr.

There were of course a number of excellent talks, many more than I could attend. Among the ones I did attend, I would mention a few in particular:

  • A talk by Nick Ulle, another student of Duncan’s, about his project to bring the LLVM compiler world to R. This is a tough challenge, but Nick is making impressive progress.
  • A talk by Kylie Bemis, a post doc at Northeastern University, and her matter file system R package, which does distributed file allocation in a clever, general manner.
  • I did not get to see Jim Harner’s talk about his R IDE, rc2  but he demonstrated it for me on his laptop, very interesting.
  • Microsoft’s David Smith, one of the pioneers of the S/R world, gave an interesting “then and now” talk, listing questions that non-useRs would ask a few years ago when he suggested their switching to R — but which they no longer ask, demonstrating the huge increase in R usage in recent years, and its increase in power and usability.

My wife and I had fun exploring Brussels — one wrong decision in a subway station resulted in our ending up in front of the EU headquarters, an interesting error to make. And by an amazing stroke of good luck, the other summer conference at which I’ll be giving a talk, Small Area Estimation 2017, is to be held in Paris the very next week.

A Partial Remedy to the Reproducibility Problem

Several years ago, John Ionnidis jolted the scientific establishment with an article titled, “Why Most Published Research Findings Are False.” He had concerns about inattention to statistical power, multiple inference issues and so on. Most people had already been aware of all this, of course, but that conversation opened the floodgates, and many more issues were brought up, such as hidden lab-to-lab variability. In addition, there is the occasional revelation of outright fraud.

Many consider the field to be at a crisis point.

In the 2014 JSM, Phil Stark organized a last-minute session on the issue, including Marcia McNutt, former editor of Science and Yoav Benjamini of multiple inference methodology fame. The session attracted a standing-room-only crowd.

In this post, Reed Davis and I are releasing the prototype of an R package that we are writing, revisit, with the goal of partially remedying the statistical and data wrangling aspects of this problem. It is assumed that the authors of a study have supplied (possibly via carrots or sticks) not only the data but also the complete code for their analyses, from data cleaning up through formal statistical analysis.

There are two main aspects:

  • The package allows the user to “replay” the authors’ analysis, and most importantly, explore other alternate analyses that the authors may have overlooked. The various alternate analyses may be saved for sharing.
  • Warn of statistical   errors, such as: overreliance on p-values; need for multiple inference procedures; possible distortion due to outliers; etc.

The term user here could refer to several different situations:

  • The various authors of a study, collaborating and trying different analyses during the course of the study.
  • Reviewers of a paper submitted for publication on the results of the study.
  • Fellow scientists who wish to delve further into the study after it is published.

The package has text and GUI versions. The latter is currently implemented as an RStudio add-in.

The package is on my GitHub site, and has a fairly extensive README file introducing the goals and usage.

Online But In-Class Examinations (with an R Example)

About a year-and-a-half ago, some students and I wrote OMSI, Online Measurement of Student Insight, an online software tool to improve examinations for students and save instructors lots of time and drudgery currently spent on administering exams. It is written in a mixture of Python and R. (Python because it was easier to find students for the project, R because it built upon a earlier system I had developed entirely in R.)

I will describe it below, but I wish to say at the outset: I NEED TESTERS! I’ve used it successfully in several classes of my own so far, but it would be great to get feedback from others. Code contribution would be great too.

From the project README file:

Students come to the classroom at the regular class time, just as with a traditional pencil-and-paper exam. However, they use their laptop computers to take the exam, using OMSI. The latter downloads the exam questions, and enables the students to upload their answers.

This benefits students. Again from the README:

  • With essay questions, you have a chance to edit your answers, producing more coherent, readable prose. No cross-outs, arrows, words squeezed in above a line, no points off for unreadable handwriting. 🙂
  • With coding questions, you can compile and run your code, giving you a chance to make corrections if your code doesn’t work.

In both of these aspects, OMSI gives you a better opportunity to demonstrate your insight into the course material, compared to the traditional exam format.

It is a great saver of time and effort for instructors, as the README says:

OMSI will make your life easier. 🙂

OMSI facilitates exam administration and grading. It has two components:

  • Exam administration. This manages the actual process of the students taking the exam. You get electronic copies of the students’ exams, eliminating the need for collecting and carrying out out a large number of papers, and making work sharing much easier among multiple graders. As noted in “Benefits for students” above, OMSI enables the student to turn in a better product, and this benefits the instructor as well: Better exam performance by students is both more gratifying to the instructor and also makes for easier, less frustrating grading.
  • Exam grading. OMSI does NOT take the place of instructor judgment in assigning points to exam problems. But it does make things much easier, by automating much of the drudgery. For instance, OMSI automatically records grades assigned by the instructor, and automatically notifies students of their grades via e-mail. Gone are the days in which the instructor must alphabetize the papers, enter the grades by hand, carry an armload (or boxload) of papers to give back to students in class, retaining the stragglers not picked up by the students, and so on.

Here is an R example showing sample exam questions. At the server, the instructor would place the following file:

QUESTION -ext .R -run "Rscript omsi_answer1.R"

Write R code that prints out the mean of R's 
built-in Niles dataset, starting with 
observation 51 (year 1921).
                                                                                QUESTION -ext .R -run "Rscript omsi_answer2.R"
Write an R function with call form g(nreps) 
that will use simulation to find the approximate value of E(|X - Y|) for 
independent N(0,1) variables X and Y.  
Here nreps is the number of 
replications.  Make sure to include 
a call print(g(10000)) in 
your answer, which will be run by

QUESTION -ext .R -run "Rscript omsi_answer3.R"                                   

Suppose X ~ U(0,1). Write an R function with 
call form g(t) which finds the density of X^2 
at t, for t in (0,1).  Make sure to include
a call print(g(0.8)) in your answer, which 
will be run by OMSI.


Suppose an article in a scientific journal states, "The treatment and nontreatment 
means were 52.15 and 52.09, with a p-value 
of 0.02.  So there is only a 2% chance that the treatment has no effect."  Comment on the propriety of that statement.

At the client side, the student would see this:


After the student enters an answer and hits Save and Run, the student’s code would be run in a pop-up window, displaying the result. When the student hits Submit, the answer is uploaded to the instructor’s server. There is a separate directory at the server for each student, and the answer files are stored there.

Again, the autograder does NOT evaluate student answers; the instructor does this. But the autograder greatly facilitates the process. The basic idea is that the software will display on the screen, for each student and each exam problem, the student’s answer.  In the case of coding questions, the software will also run the code and display the result.  In each case, the instructor then inputs the number of points he/she wishes to assign.

The package is easy to install and use, from both the student and instructor point of view. See the README for details.