Prominent statistician Frank Harrell has come out with a radically new R tutorial, rflow. The name is short for “R workflow,” but I call it “R in a box” –everything one needs for beginning serious usage of R, starting from little or no background.
By serious usage I mean real applications in which the user has a substantial computational need. This could be a grad student researcher, a person who needs to write data reports for her job, or simply a person who is doing personal analysis such as stock picking.
Like other tutorials/books, rflow covers data manipulation, generation of tables and graphics, etc. But UNLIKE many others, rflow empowers the user to handle general issues as they inevitably pop up, as opposed to just teaching a few basic, largely ungeneralizable operations. I’ve criticized the tidyverse in particular for that latter problem, but really no tutorial, including my own, has this key “R in a box” quality.
The tutorial is arranged into 19 short “chapters,” beginning with R Basics, all the way through such advanced topics as Manipulating Longitudinal Data and Parallel Computing. The exciting new Quarto presentation tool by RStudio is featured, as is the data.table package, essential for practical use of large datasets.
Note carefully that this tutorial is the product of Frank’s long experience “in the trenches,” conducting intensive data analysis in biomedical applications. (This specific field of application is irrelevant; rflow is just as useful to, say, marketing analysts, as it is for medicine.) His famous monograph, Regression Modeling Strategies, is a standard reference in the field. Even I, as the author of my own regression book, often find myself checking out what Frank has to say in his book about various topics.
This point about rflow arising from Frank’s long experience dealing with real data is absolutely key, in my view. And his choice of topics, and especially their ordering, reflects that. For instance, he brings in the topic of missing data early in the tutorial.
Anyone who teaches R, or is learning R, should check out rflow.