Greatly Revised Edition of Tidyverse Skeptic

April 2, 2022 matloff 23 Comments

As a longtime R user and someone with a passionate interest in how people learn, I continue to be greatly concerned about the use of the Tidyverse in teaching noncoder learners of R. Accordingly, I have now thoroughly revised my Tidyverse Skeptic essay. It is greatly reorganized with focus on teaching R, with a number of new examples, and some material on historical context of the rise of Tidy. I continue to on the one hand thank RStudio for its overall contribution to the R community but on the other believe that using Tidy for teaching beginners is actually an obstacle to learning for that group.

I close the essay by first noting that RStudio is now a Public Interest Corporation, thus with much broader public responsibility. I then renew a request I made to RStudio founder/CEO JJ Allaire when he met with me in 2019: “Please encourage R instructors to use a mixture of Tidy and base-R in their teaching.”

Please read the revised essay at the above link. Its Overview section is reproduced below.

Again, my focus here is on teaching R to those with little or no coding background. I am not discussing teaching Computer Science students.

Tidy was consciously designed to equip learners with just a small set of R tools. The students learn a few dplyr verbs well, but that equips them to do much less with R than a standard R beginners course would teach. That leaves the learners less equipped to put R to real use, compared to “graduates” of standard base-R courses.

Thus the “testimonials” in which Tidy teachers of R claim great success are misleading. The “success” is due to watering down the material (and false conflation with ggplot2). The students learn to mimic a few example patterns, but are not equipped to go further.

The refusal to teach ‘$’, and the de-emphasis of, or even complete lack of coverage of, R vectors is a major handicap for Tidy “graduates” to making use of most of R’s statistical functions and statistical packages.

Tidy is too abstract for beginners, due to the philosophy of functional programming (FP). The latter is popular with many sophisticated computer scientists, but is difficult even for computer science students. Tidy is thus unsuited as the initial basis of instruction for nonprogrammer students of R. FP should be limited and brought in gradually. The same statement applies to base-R’s own FP functions.

The FP philosophy replaces straightforward loops with abstract use of functions. Since functions are the most difficult aspect for noncoder R learners, FP is clearly not the right path for such learners. Indeed, even many Tidy advocates concede that it is in various senses often more difficult to write Tidy code than base-R. Hadley says, for instance, “it may take a while to wrap your head around [FP].”

A major problem with Tidy for R beginners is cognitive overload: The basic operations contain myriad variants. Though of course one need not learn them all, one needs some variants even for simple operations, e.g. pipes on functions of more than one argument.

The obsession among many Tidyers that one must avoid writing loops, the ‘$’ operator, brackets and so on often results in obfuscated code. Once one goes beyond the simple mutate/select/filter/summarize level, Tidy programming can be of low readability.

Tidy advocates also concede that debugging Tidy code is difficult, especially in the case of pipes. Yet noncoder learners are the ones who make the most mistakes, so it makes no sense to have them use a coding style that makes it difficult to track down their errors.

Note once again, that in discussing teaching, I am taking the target audience here to be nonprogrammers who wish to use R for data analysis. Eventually, they may wish to make use of FP, but at the crucial beginning stage, keep it simple, little or no fancy stuff.

23 thoughts on “Greatly Revised Edition of Tidyverse Skeptic”

Pingback: Greatly Revised Edition of Tidyverse Skeptic – Data Science Austria
Pingback: Edisi Sangat Revisi dari Tidyverse Skeptic - hapidzfadli
Sebastian Varela says:

April 3, 2022 at 9:52 am

In my experience as an R instructor, tidyverse is simple and enjoyable for students. Otherwise it would be very difficult for them to stay in the language, delving into its details. From my humble point of view tidyverse has kept R afloat.

Reply
1. matloff says:
  
  April 3, 2022 at 10:08 am
  
  I think your comments here are consistent with mine, e.g. that Tidy-based courses are watered down compared to the standard ones?
  
  Reply
  1. Sebastian Varela says:
    
    April 3, 2022 at 11:14 am
    
    You may be right, but consider that every approach has strengths and weaknesses.
    Tidyverse courses are probably “the standard” nowadays and I think every introductory course should be watered down to some extent. Those who develop passion for data analysis and the R language have later the chance to deepen their knowledge of the language on their own.
    I believe if tidyverse had not be there, R usage would have declined dramaticaly, and almost nobody would even read your blog.
    But in any case, reflection and criticism is always healthy. Thanks for your post.
    
    Reply
    1. matloff says:
      
      April 3, 2022 at 11:34 am
      
      Actually I almost never pist to my blog, so I don’t really care much about how many people read it. 🙂
      
      R was increasing in number of users before Tidy, and would have done so without it. I applauded RStudio for bringing in more Rlearners, but if they won’t actually be users, then I don’t see the point.
      
      Reply
Marcus Birkenkrahe says:

April 3, 2022 at 7:32 pm

I am looking forward to the updated essay. I have stayed away from the “Tidyverse” and RStudio in my teaching of beginners for the most part, with good results. I find the dominance of the “Tidyverse” baffling. Rather than dumbing down, however, I’m spicing things up in my classes: this term, I have used Emacs (+ Org-mode + ESS) in all my undergrad classes (R, C, C++, SQL, bash, 100 to 400 levels) for the first time, with good results, too. I had actually not expected that all students would be working in Emacs + Org-mode after only a few weeks. I don’t think I’m going to look back at RStudio, and I will keep developing base R (and data.table) alternatives for the sake of clarity, performance, and accessability. I’m going to write my experiences up this summer. I’m going to assign your essay as reading this week to my advanced students – when they go into internships or into industry, many of them will have to effectively be R teachers, too.

Reply
1. matloff says:
  
  April 3, 2022 at 8:37 pm
  
  Thanks, interesting comments. Emacs is an extremely sophisticated tool.
  
  Reply
  1. Marcus Birkenkrahe says:
    
    April 4, 2022 at 6:39 am
    
    Yes, it’s sophisticated, but not forbiddingly so (apparently – many of my students aren’t even CS) and just the right kind of sophisticated (one doesn’t need to go into Lisp to use it effectively) as Dirk Eddelbuettel shows in his intro series: https://youtu.be/1YOrd7NCGkg
    
    Reply
    1. matloff says:
      
      April 4, 2022 at 12:45 pm
      
      Thinks, I didn’t know that Dirk was doing this.
      
      Reply
Pingback: Greatly Revised Edition of Tidyverse Skeptic | R-bloggers
1. matloff says:
  
  April 5, 2022 at 9:29 am
  
  Thanks!
  
  Reply
SmokeyShakers says:

April 4, 2022 at 7:44 am

Great essay! I’d add that NSE is weird part of R to get your head around, and I think tidy has just made even more confusing. quosures, enquousers, bang-bangs, now curly brackets.

Reply
1. matloff says:
  
  April 4, 2022 at 12:44 pm
  
  Good point. The more problems are pointed out, the more Rube Goldberg Machine-like the solutions.
  
  Reply
Richard Layton says:

April 4, 2022 at 9:48 am

Acknowledging that anecdotes are not data, my experience in teaching R to beginners supports your points. Some of my R novices (primarily engineering students but adult “workshoppers” as well) have found it difficult to build on the basic Tidy verbs to solve the data wrangling problems they confronted in their projects. I have worked one-on-one with learners confronting precisely the cognitive overloads you describe in some of your examples.

In addition, moving beyond the issues faced by beginners, I have moved from Tidy to base R and data.table for package writing because of the complexity of programming over Tidy.

I applaud and thank JJ Allaire, Hadley Wickham, Yihui Xie, and everyone at RStudio for providing us with such wonderful tools. And thank you, Norm, for your insightful and thought-provoking essay.

Reply
Jens says:

April 5, 2022 at 4:43 am

Typo in title: “More Effectie Manner” -> Effecti*v*3

Reply
Luiz F. P. Droubi says:

April 8, 2022 at 10:14 am

I’m sorry, I have just made a quick read. But I didn’t see this, which I always use, and I think is the best choice:

Instead of using:
mtcars$hwratio % mutate(hwratio=hp/wt) -> mtcars

why not this?:

within(mtcars, hwratio <- hp/wt)

It avoids repetition of the df name, avoids using the $ operator and it is still base R!!! Ans with braces one can make as much mutations as he wants…

Reply
1. matloff says:
  
  April 8, 2022 at 10:31 am
  
  Of course. This is a common approach. However, for beginning learners, it might be better to keep the number of concepts small, saving things like with() and within() for later.
  
  Reply
Luiz F. P. Droubi says:

April 8, 2022 at 10:19 am

I personally like using the tidyverse only for map creation with leaflet and sf package. I think it is very useful for the layer logic of maps. I like the tibble package because, sometimes, I like to create data.frames with the tribble function (i.e. rowise), but it’s almost always possible to do the same with the (base) matrix function too. All the rest I go with the base R.

Reply
Jens says:

April 9, 2022 at 1:38 am

Things I feel might deserve even greater emphasis:

1. There is much to be said in favor of doing as much as possible with the basic features of a language, rather than with libraries, and only using one or two libraries at a time. The basic features of a language are stable and reliable. Relying on layers upon layers of libraries introduces complexity. It becomes exponentially harder to understand what is going on under the hood and to understand error messages; code that worked a year ago may no longer work today; recreating the development or production environment becomes a major hurdle by itself. If I have learned one thing in software development it is to value simplicity.

This point is also made here: https://www.tinyverse.org

2. ggplot2 has its place, but so do base graphic plots. Some great recent textbooks with illustrations created with Base R:

https://www.statlearning.com
https://avehtari.github.io/ROS-Examples/

Yes, the authors learned R before the Tidyverse came up, but they also know what they are doing. (BTW, all the code in these books is in Base R, and can be compared to attempts at recreating the examples using Tidyverse and Python, provided by others.)

Base R graphics makes simple things simple, and allows to progressively enhance simple plots with full control over every detail. When plots are used for communication (as in books or presentations), full control over every detail matters greatly. It is also easy to generate many similar plots programmatically.

3. In a business setting, SQL may largely obviate the need for libraries like dplyr. In job interviews for data analyst / data scientist positions, knowledge of SQL is expected. Sometimes it is better to combine different tools / programming languages that each offer a simple solution to a specific problem (Unix philosophy).

Finally, some reading tips for those who come across this blog post and want to get started with Base R:

The Art of R Programming: A Tour of Statistical Software Design, by Norman Matloff, our host 🙂

Learning R: A Step-by-Step Function Guide to Data Analysis, by Richard Cotton

Hands-On Programming with R: Write Your Own Functions and Simulations, by Garrett Grolemund

Reply
Anonymous says:

May 24, 2022 at 11:23 pm

I really love your essay. I started learning R through tidyverse then I switch to base + data.table and since then I use 10 less functions, 10 less libraries my code is by far much faster and I have like 100 less dependencies. Dependencies looks like is not important in a class but when you are working in a company that’s not true at all and, how many dependencies are in a call like library(tidyverse)? How many libraries were loaded? Why load thousand of functions if you are going to use 10 verbs? And from what library are those functions coming? That’s not pedagological at all.

Well, in my experience something really important is that since I switch from tidyverse to base + data.table I started using much more base R functions that I’ve no idea they exists! And not need at all for tidyverse, never. Also I’d like to point out that using tidyverse you have to update your knowledge every year because they change functions all the time. In your essay you write about summarise, summarise_at, etc. But now one needs to use across, and teach students a new function. Is not pedagogical at all to learn one function in year t and another new one in year t+1, and so on.

Thank you very much for your essay and for your books. The Art of R programming was one of the best book I’ve read.

Reply
Thomas Kelly says:

August 11, 2022 at 3:15 pm

I could not agree more. Tidyverse has very little translatable skills compared to a base-R course. Loops, indexing, low-level logic are prerequisite for other languages–yet something I keep having to show and teach to students who “know [tidy] R”. While all tools may be useful, starting somewhere that prioritizes autonomy and self-discovery is critical.

Reply
Daniel says:

October 12, 2022 at 9:56 am

Thanks Prof Matloff. I have difficulties reading a lot of tidyverse code especially when people use pipes. Intuition and general understanding of concepts seem to have been replaced by a plethora of functions that all do the same thing but on slightly different data structures. The tidyverse feels like my personal helper function library that I would never inflict on anybody else. I can imagine that a student raised in the tidyverse could quickly find themselves stranded in a novel situation without knowing to do.

Reply

	Anonymous on Just How Good Is ChatGPT in Da…
	Quantile Regression… on Quantile Regression with Rando…
	Anonymous on Quantile Regression with Rando…
	Sina Özdemir on qeML Example: Nonparametric Qu…
	Anonymous on qeML Example: Nonparametric Qu…

Mad (Data) Scientist

Greatly Revised Edition of Tidyverse Skeptic

23 thoughts on “Greatly Revised Edition of Tidyverse Skeptic”

Leave a comment Cancel reply

Musings, useful code etc. on R and data science

Share this:

Related

23 thoughts on “Greatly Revised Edition of Tidyverse Skeptic”

Leave a comment Cancel reply

Musings, useful code etc. on R and data science