Base-R Is Alive and Well

As many readers of this blog know, I strongly believe that R learners should be taught base-R, not the tidyverse. Eventually the students may settle on using a mix of the two paradigms, but at the learning stage they will benefit from the fact that base-R is simple and more powerful. I’ve written my thoughts in a detailed essay.

One of the most powerful tools in base-R is tapply(), a workhorse of base-R. I give several examples in my essay in which it is much simpler and easier to use that function instead of the tidyverse.

Yet somehow there is a disdain for tapply() among many who use and teach Tidy. To them, the function is the epitome of “what’s wrong with” base-R. The latest example of this attitude arose in Twitter a few days ago, in which two Tidy supporters were mocking tapply(), treating it as a highly niche function with no value in ordinary daily usage of R. They strongly disagreed with my “workhorse” claim, until I showed them that in the code of ggplot2, Hadley has 7 calls to tapply(),

So I did a little investigation of well-known R packages by RStudio and others. The results, which I’ve added as a new section in my essay, are excerpted below.

——————————–

All the breathless claims that Tidy is more modern and clearer, whilc base-R is old-fashioned and unclear, fly in the face of the fact that RStudio developers, and authors of other prominent R packages, tend to write in base-R, not Tidy. And all of them use some base-R instead of the corresponding Tidy constructs.

package *apply() calls mutate() calls
brms 333 0
broom 38 58
datapasta 31 0
forecast 82 0
future 71 0
ggplot2 78 0
glmnet 92 0
gt 112 87
knitr 73 0
naniar 3 44
parsnip 45 33
purrr 10 0
rmarkdown 0 0
RSQLite 14 0
tensorflow 32 0
tidymodels 8 0
tidytext 5 6
tsibble 8 19
VIM 117 19

Striking numbers to those who learned R via a tidyverse course. In particular, mutate() is one of the very first verbs one learns in a Tidy course, yet mutate() is used 0 times in most of the above packages. And even in the packages in which this function is called a lot, they also have plenty of calls to base-R *apply(), functions which Tidy is supposed to replace.

Now, why do these prominent R developers often use base-R, rather than the allegedly “modern and clearer” Tidy? Because base-R is easier.

And if it’s easier for them, it’s even further easier for R learners. In fact, an article discussed later in this essay, aggressively promoting Tidy, actually accuses students who use base-R instead of Tidy as taking the easy way out. Easier, indeed!

19 thoughts on “Base-R Is Alive and Well”

    1. Actually my table column heading is *apply(), meaning all of the apply() family. It is meant to be representative of base-R. Similarly, mutate() is meant to be representative of Tidy. No claim is made that they are equivalent.

  1. While I agree on the base R vs. Tidyverse issue, I think writing code for a data analysis is different than writing for a package. For instance, there is some pressure in package dev to reduce the number of dependencies. That pressure can override ease of use considerations. You can imagine other confounders.

    Most R code is not for packages either. I think to make a stronger argument, you ought to conduct a survey of analysis code as well.

    1. Look at the packages in my table that do use mutate(), so a dplyr dependency is a given. Yet even they use a lot of base-R. The main exception is naniar, which was meant to “abide by tidyverse principles,” according to their self-description; in other words, it is a demonstration project. (Very good package, BTW. I often recommend it to people.)

  2. “Because base-R is easier.” Do these developers say that’s why they used Base-R? You could just reach out and ask Hadley for his thoughts but I don’t think that’s the reason. I do assume Base-R is FASTER and smaller. If I’m writing a package, efficiency is a higher priority. Tidy verbs add a layer that makes code much easier to read, certainly, Easier to use? For many people, yes, obviously, given their popularity.

    1. You should try a few experiments. There really is no efficiency issue. I explain Tidy’s popularity in my essay: “Due to a catchy name, a charismatic developer, the Bandwagon Effect, and highly aggressive marketing by a dominant commercial entity, Tidy has swept the R world.”

  3. There was a talk on this subject at the recent RStudio conference. This prof. used Tidyverse in one stats 101 section and Base-R in another.
    Bottom line: she didn’t find a much of a difference in student experience or outcomes. Alas, the presentation materials are not available. https://sched.co/11ian

    1. Yes, this talk was by a tidyverse proponent. I do think the experiment was fairly conducted and reported, but the problem is that Tidy proponents have “dumbed down” the scope of R courses, so students would meet the requirements regardless of what method is used to teach it. Tidy was designed to be useful for only a narrow scope of applications, and Tidy courses reflect that.

    2. I have been using R since 2000. I use base R most of the time. I use ggplot2 occasionally just because it’s popular. Otherwise, base R plotting is more than enough for my work. There is no real need for most tidyverse packages which are just wrappers for base R functions.

  4. > Now, why do these prominent R developers often use base-R, rather than the allegedly “modern and clearer” Tidy? Because base-R is easier.

    how do you come to this conclusion? There are different possible explanations. One that I can see is that, since these are very popular packages, they are taking great care to limit the dependencies, so for example they don’t want ggplot2 to require you to install dplyr.

  5. Tidyverse data manipulation, such as mutate, teaches a set of non-transferrable skills and verbs. The apply functions get at the basic SQL GROUP BY idea, whereas Tidyverse has an explicit group_by function which I’ve never seen in code and actions nothing like the SQL clause.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.