Greatly Revised Edition of Tidyverse Skeptic

As a longtime R user and someone with a passionate interest in how people learn, I continue to be greatly concerned about the use of the Tidyverse in teaching noncoder learners of R. Accordingly, I have now thoroughly revised my Tidyverse Skeptic essay. It is greatly reorganized with focus on teaching R, with a number of new examples, and some material on historical context of the rise of Tidy. I continue to on the one hand thank RStudio for its overall contribution to the R community but on the other believe that using Tidy for teaching beginners is actually an obstacle to learning for that group.

I close the essay by first noting that RStudio is now a Public Interest Corporation, thus with much broader public responsibility. I then renew a request I made to RStudio founder/CEO JJ Allaire when he met with me in 2019: “Please encourage R instructors to use a mixture of Tidy and base-R in their teaching.”

Please read the revised essay at the above link. Its Overview section is reproduced below.

  • Again, my focus here is on teaching R to those with little or no coding background. I am not discussing teaching Computer Science students.
  • Tidy was consciously designed to equip learners with just a small set of R tools. The students learn a few dplyr verbs well, but that equips them to do much less with R than a standard R beginners course would teach. That leaves the learners less equipped to put R to real use, compared to “graduates” of standard base-R courses.
  • Thus the “testimonials” in which Tidy teachers of R claim great success are misleading. The “success” is due to watering down the material (and false conflation with ggplot2). The students learn to mimic a few example patterns, but are not equipped to go further.
  • The refusal to teach ‘$’, and the de-emphasis of, or even complete lack of coverage of, R vectors is a major handicap for Tidy “graduates” to making use of most of R’s statistical functions and statistical packages.
  • Tidy is too abstract for beginners, due to the philosophy of functional programming (FP). The latter is popular with many sophisticated computer scientists, but is difficult even for computer science students. Tidy is thus unsuited as the initial basis of instruction for nonprogrammer students of R. FP should be limited and brought in gradually. The same statement applies to base-R’s own FP functions.
  • The FP philosophy replaces straightforward loops with abstract use of functions. Since functions are the most difficult aspect for noncoder R learners, FP is clearly not the right path for such learners. Indeed, even many Tidy advocates concede that it is in various senses often more difficult to write Tidy code than base-R. Hadley says, for instance, “it may take a while to wrap your head around [FP].”
  • A major problem with Tidy for R beginners is cognitive overload: The basic operations contain myriad variants. Though of course one need not learn them all, one needs some variants even for simple operations, e.g. pipes on functions of more than one argument.
  • The obsession among many Tidyers that one must avoid writing loops, the ‘$’ operator, brackets and so on often results in obfuscated code. Once one goes beyond the simple mutate/select/filter/summarize level, Tidy programming can be of low readability.
  • Tidy advocates also concede that debugging Tidy code is difficult, especially in the case of pipes. Yet noncoder learners are the ones who make the most mistakes, so it makes no sense to have them use a coding style that makes it difficult to track down their errors.
  • Note once again, that in discussing teaching, I am taking the target audience here to be nonprogrammers who wish to use R for data analysis. Eventually, they may wish to make use of FP, but at the crucial beginning stage, keep it simple, little or no fancy stuff.

22 thoughts on “Greatly Revised Edition of Tidyverse Skeptic”

  1. In my experience as an R instructor, tidyverse is simple and enjoyable for students. Otherwise it would be very difficult for them to stay in the language, delving into its details. From my humble point of view tidyverse has kept R afloat.

      1. You may be right, but consider that every approach has strengths and weaknesses.
        Tidyverse courses are probably “the standard” nowadays and I think every introductory course should be watered down to some extent. Those who develop passion for data analysis and the R language have later the chance to deepen their knowledge of the language on their own.
        I believe if tidyverse had not be there, R usage would have declined dramaticaly, and almost nobody would even read your blog.
        But in any case, reflection and criticism is always healthy. Thanks for your post.

        1. Actually I almost never pist to my blog, so I don’t really care much about how many people read it. šŸ™‚

          R was increasing in number of users before Tidy, and would have done so without it. I applauded RStudio for bringing in more Rlearners, but if they won’t actually be users, then I don’t see the point.

  2. I am looking forward to the updated essay. I have stayed away from the “Tidyverse” and RStudio in my teaching of beginners for the most part, with good results. I find the dominance of the “Tidyverse” baffling. Rather than dumbing down, however, I’m spicing things up in my classes: this term, I have used Emacs (+ Org-mode + ESS) in all my undergrad classes (R, C, C++, SQL, bash, 100 to 400 levels) for the first time, with good results, too. I had actually not expected that all students would be working in Emacs + Org-mode after only a few weeks. I don’t think I’m going to look back at RStudio, and I will keep developing base R (and data.table) alternatives for the sake of clarity, performance, and accessability. I’m going to write my experiences up this summer. I’m going to assign your essay as reading this week to my advanced students – when they go into internships or into industry, many of them will have to effectively be R teachers, too.

  3. Great essay! I’d add that NSE is weird part of R to get your head around, and I think tidy has just made even more confusing. quosures, enquousers, bang-bangs, now curly brackets.

  4. Acknowledging that anecdotes are not data, my experience in teaching R to beginners supports your points. Some of my R novices (primarily engineering students but adult “workshoppers” as well) have found it difficult to build on the basic Tidy verbs to solve the data wrangling problems they confronted in their projects. I have worked one-on-one with learners confronting precisely the cognitive overloads you describe in some of your examples.

    In addition, moving beyond the issues faced by beginners, I have moved from Tidy to base R and data.table for package writing because of the complexity of programming over Tidy.

    I applaud and thank JJ Allaire, Hadley Wickham, Yihui Xie, and everyone at RStudio for providing us with such wonderful tools. And thank you, Norm, for your insightful and thought-provoking essay.

  5. I’m sorry, I have just made a quick read. But I didn’t see this, which I always use, and I think is the best choice:

    Instead of using:
    mtcars$hwratio % mutate(hwratio=hp/wt) -> mtcars

    why not this?:

    within(mtcars, hwratio <- hp/wt)

    It avoids repetition of the df name, avoids using the $ operator and it is still base R!!! Ans with braces one can make as much mutations as he wants…

    1. Of course. This is a common approach. However, for beginning learners, it might be better to keep the number of concepts small, saving things like with() and within() for later.

  6. I personally like using the tidyverse only for map creation with leaflet and sf package. I think it is very useful for the layer logic of maps. I like the tibble package because, sometimes, I like to create data.frames with the tribble function (i.e. rowise), but it’s almost always possible to do the same with the (base) matrix function too. All the rest I go with the base R.

  7. Things I feel might deserve even greater emphasis:

    1. There is much to be said in favor of doing as much as possible with the basic features of a language, rather than with libraries, and only using one or two libraries at a time. The basic features of a language are stable and reliable. Relying on layers upon layers of libraries introduces complexity. It becomes exponentially harder to understand what is going on under the hood and to understand error messages; code that worked a year ago may no longer work today; recreating the development or production environment becomes a major hurdle by itself. If I have learned one thing in software development it is to value simplicity.

    This point is also made here:

    2. ggplot2 has its place, but so do base graphic plots. Some great recent textbooks with illustrations created with Base R:

    Yes, the authors learned R before the Tidyverse came up, but they also know what they are doing. (BTW, all the code in these books is in Base R, and can be compared to attempts at recreating the examples using Tidyverse and Python, provided by others.)

    Base R graphics makes simple things simple, and allows to progressively enhance simple plots with full control over every detail. When plots are used for communication (as in books or presentations), full control over every detail matters greatly. It is also easy to generate many similar plots programmatically.

    3. In a business setting, SQL may largely obviate the need for libraries like dplyr. In job interviews for data analyst / data scientist positions, knowledge of SQL is expected. Sometimes it is better to combine different tools / programming languages that each offer a simple solution to a specific problem (Unix philosophy).

    Finally, some reading tips for those who come across this blog post and want to get started with Base R:

    The Art of R Programming: A Tour of Statistical Software Design, by Norman Matloff, our host šŸ™‚

    Learning R: A Step-by-Step Function Guide to Data Analysis, by Richard Cotton

    Hands-On Programming with R: Write Your Own Functions and Simulations, by Garrett Grolemund

  8. I really love your essay. I started learning R through tidyverse then I switch to base + data.table and since then I use 10 less functions, 10 less libraries my code is by far much faster and I have like 100 less dependencies. Dependencies looks like is not important in a class but when you are working in a company that’s not true at all and, how many dependencies are in a call like library(tidyverse)? How many libraries were loaded? Why load thousand of functions if you are going to use 10 verbs? And from what library are those functions coming? That’s not pedagological at all.

    Well, in my experience something really important is that since I switch from tidyverse to base + data.table I started using much more base R functions that I’ve no idea they exists! And not need at all for tidyverse, never. Also I’d like to point out that using tidyverse you have to update your knowledge every year because they change functions all the time. In your essay you write about summarise, summarise_at, etc. But now one needs to use across, and teach students a new function. Is not pedagogical at all to learn one function in year t and another new one in year t+1, and so on.

    Thank you very much for your essay and for your books. The Art of R programming was one of the best book I’ve read.

  9. I could not agree more. Tidyverse has very little translatable skills compared to a base-R course. Loops, indexing, low-level logic are prerequisite for other languages–yet something I keep having to show and teach to students who “know [tidy] R”. While all tools may be useful, starting somewhere that prioritizes autonomy and self-discovery is critical.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.