Regression Analysis — What You Should’ve Been Taught But Weren’t, and Were Taught But Shouldn’t Have Been

The above title was the title of my talk this evening at our Bay Area R Users Group. I had been asked to talk about my new book, and I presented four of the myths that are dispelled in the book.

Hadley also gave an interesting talk, “An introduction to tidy evaluation,” involving some library functions that are aimed at writing clearer, more readable R. The talk came complete with audience participation, very engaging and informative.

The venue was GRAIL, a highly-impressive startup. We will be hearing a lot more about this company, I am sure.


cdparcoord: Parallel Coordinates Plots for Categorical Data

My students, Vincent Yang and Harrison Nguyen, and I have developed a new data visualization package, cdparcoord, available now on CRAN. It can be viewed as an extension of the freqparcoord package written by a former grad student, Yingkang Xie and myself, which I have written about before in this blog.

The idea behind both packages is to remedy the “black screen problem” in parallel coordinates plots, in which there are so many lines plotted that the screen fills and no patterns are discernible. We avoid this by plotting only the most “typical” lines, as defined by estimated nonparametric density value in freqparcoord and by simple counts in cdparcoord.

There are lots of pretty (and hopefully insight-evoking) pictures, plus directions for quickstart use of the package, in the vignette. We have an academic paper available that explains the background, related work an so on.

Just as with Yingkang on freqparcoord, huge credit for cdparcoord goes to Vincent and Harrison, who came up with creative, remarkable solutions to numerous knotty problems that arose during the development of the package. And get this — they are undergraduates (or were at the time; Harrison graduated in June).