Update on Polynomial Regression in Lieu of Neural Nets

There was quite a reaction to our paper, “Polynomial Regression as an Alternative to Neural Nets” (by Cheng, Khomtchouk, Matloff and Mohanty), leading to discussions/debates on Twitter, Reddit, Hacker News and so on. Accordingly, we have posted a revised version of the paper. Some of the new features:

  • Though originally we had made the disclaimer that we had not yet done any experiments with image classification, there were comments along the lines of “If the authors had included even one example of image classification, even the MNIST data, I would have been more receptive.” So our revision does exactly that, with the result that polynomial regression does well on MNIST even with only very primitive preprocessing (plain PCA).
  • We’ve elaborated on some of the theory (still quite informal, but could be made rigorous).
  • We’ve added elaboration on other aspects, e.g. overfitting.
  • We’ve added a section titled, “What This Paper Is NOT.” Hopefully those who wish to comment without reading the paper (!) this time will at least read this section. 🙂
  • Updated and expanded results of our data experiments, including more details on how they were conducted.

We are continuing to add features to our associated R package, polyreg. More news on that to come.

Thanks for the interest. Comments welcome!


Neural Networks Are Essentially Polynomial Regression

You may be interested in my new arXiv paper, joint work with Xi Cheng, an undergraduate at UC Davis (now heading to Cornell for grad school); Bohdan Khomtchouk, a post doc in biology at Stanford; and Pete Mohanty,  a Science, Engineering & Education Fellow in statistics at Stanford. The paper is of a provocative nature, and we welcome feedback.

A summary of the paper is:

  • We present a very simple, informal mathematical argument that neural networks (NNs) are in essence polynomial regression (PR). We refer to this as NNAEPR.
  • NNAEPR implies that we can use our knowledge of the “old-fashioned” method of PR to gain insight into how NNs — widely viewed somewhat warily as a “black box” — work inside.
  • One such insight is that the outputs of an NN layer will be prone to multicollinearity, with the problem becoming worse with each successive layer. This in turn may explain why convergence issues often develop in NNs. It also suggests that NN users tend to use overly large networks.
  • NNAEPR suggests that one may abandon using NNs altogether, and simply use PR instead.
  • We investigated this on a wide variety of datasets, and found that in every case PR did as well as, and often better than, NNs.
  • We have developed a feature-rich R package, polyreg, to facilitate using PR in multivariate settings.

Much work remains to be done (see paper), but our results so far are very encouraging. By using PR, one can avoid the headaches of NN, such as selecting good combinations of tuning parameters, dealing with convergence problems, and so on.

Also available are the slides for our presentation at GRAIL on this project.

Women in R

Last week I gave one of the keynote addresses at R/Finance 2018 in Chicago. I considered it an honor and a pleasure to be there, both because of the stimulating intellectual exchange and the fine level of camaraderie and hospitality that prevailed. I mentioned at the start of my talk that the success of this conference, now in its tenth year, epitomized the wonderful success enjoyed nowadays by the R language.

On the first day of the conference, one of the session chairs announced that a complaint had been made by the group R-Ladies, concerning the fact that all of the talks were given by men. The chair apologized for that, and promised efforts to remedy the situation in the future. Then on the second day, room was made in the schedule for two young women from R-Ladies to make a presentation. There also was a research paper presented by a woman, added at the last minute; she had presented work at the conference in the past.

I have been interested in status-of-women issues for a long time, and I spoke briefly with one of the R-Ladies women after the session. I suggested that she read a blog post I had written that raised some troubling related issues.

But I didn’t give the matter much further thought until Tuesday of this week, when a friend asked me about the “highly eventful” conference. That comment initially baffled me, but it turned out that he was referring to the R-Ladies controversy, which he had been following in the “tweetstorm” on the issue in #rfinance2018 . Not being a regular Twitter user, I had been unaware of this.

Again, issues of gender inequity (however defined) have been a serious, proactive concern of mine over the years. I have been quite active in championing the cases of talented female applicants for faculty positions at my university, for instance. Of my five current research students, four are women. In fact, one of them, Robin Yancey, is a coauthor with me of the partools package that played a prominent role in my talk at this conference.

That said, I must also say that those tweets criticizing the conference organizers were harsh and unfair. As that member of the program committee pointed out, other than keynote speakers, the program is comprised of papers submitted for consideration by potential authors, and it turned out that no papers had been submitted by women. Many readers of those tweets will think that the program committee is prejudiced against women, which I really doubt is the case.

The women who complained also cited lack of a Code of Conduct for the conference. This too turned out to be a misleading claim, as there had been a Code of Conduct posted by the University of Illinois at Chicago, the host of the conference.

So, apparently there was no error of commission here, but some may feel an error of omission did occur. Arguably any conference should make more proactive efforts to encourage female potential authors to submit papers for consideration in the program. Many conferences have invited talks, for instance, and R/Finance may wish to consider this.

However, there is, as is often the case, an issue of breadth of the pool. Granted, things like applicant pools are often used as excuses by, for example, employers for significant gender imbalances in their workforces. But as far as I know, the current state of affairs is:

  • The vast majority of creators (i.e. ‘cre’ status) of R packages in CRAN etc. are men.
  • The authors of the vast majority of books involving R are men.
  • The authors of the vast majority of research papers related to R are men.

It is these activities that lead to giving conference talks, and groups like R-Ladies should promote more female participation in them. We all know some outstanding women in those activities, but to truly solve the problem, many more women need to get involved.

(Some material here was updated on July 21, 2018.)

Xie Yihui, R Superstar and Mensch

Yesterday a friend told me, “Yihui has written the most remarkably open blog post, and you’ve got to read it.” I did and it was. Though my post here is not about R per se, it is about a great contributor to R, our Yihui, Dr. of Statistics and (according to him) Master of Procrastination.

I can relate to his comments personally, and indeed he has written the essay that I never had the courage to write about myself. But the big message in Yihui’s posting is that, really, that MP degree of his is far more useful than his PhD. If Yihui had been the Tiger Cub type (child of a Tiger Mom), we wouldn’t have knitr, and a lot more.

I was a strong opponent of Tiger Mom-ism long before Amy Chua coined the term. To me, it is highly counterproductive, destroying precious creativity and often causing much misery. I’m not endorsing laziness, mind you,, but as Yihui shows, creative procrastination can produce wonderful results. As I write at the above link,

I submit that innovative people tend to be dreamers. I’m certainly not advocating that parents raise lazy kids, but all that intense regimentation in Tiger Mom-land clearly gives kids no chance to breathe, let alone dream.

Yihui is a dreamer, and the R community is much better for it.

I could tell Yihui is exceptionally creative the first day I met him. Who else would have the chutzpah to name his Web site The Capital of Statistics? 🙂

As mentioned, it was quite courageous on Yihui’s part to write his essay, but he is doing a public good in doing so; many, I’m sure, will find it inspirational.

Good for him, and good for R.


Regression Analysis — What You Should’ve Been Taught But Weren’t, and Were Taught But Shouldn’t Have Been

The above title was the title of my talk this evening at our Bay Area R Users Group. I had been asked to talk about my new book, and I presented four of the myths that are dispelled in the book.

Hadley also gave an interesting talk, “An introduction to tidy evaluation,” involving some library functions that are aimed at writing clearer, more readable R. The talk came complete with audience participation, very engaging and informative.

The venue was GRAIL, a highly-impressive startup. We will be hearing a lot more about this company, I am sure.

cdparcoord: Parallel Coordinates Plots for Categorical Data

My students, Vincent Yang and Harrison Nguyen, and I have developed a new data visualization package, cdparcoord, available now on CRAN. It can be viewed as an extension of the freqparcoord package written by a former grad student, Yingkang Xie and myself, which I have written about before in this blog.

The idea behind both packages is to remedy the “black screen problem” in parallel coordinates plots, in which there are so many lines plotted that the screen fills and no patterns are discernible. We avoid this by plotting only the most “typical” lines, as defined by estimated nonparametric density value in freqparcoord and by simple counts in cdparcoord.

There are lots of pretty (and hopefully insight-evoking) pictures, plus directions for quickstart use of the package, in the vignette. We have an academic paper available that explains the background, related work an so on.

Just as with Yingkang on freqparcoord, huge credit for cdparcoord goes to Vincent and Harrison, who came up with creative, remarkable solutions to numerous knotty problems that arose during the development of the package. And get this — they are undergraduates (or were at the time; Harrison graduated in June).

Wrong on an Astronomical Scale

I recently posted an update regarding our R package revisit, aimed at partially remedying the reproducibility crisis, both in the sense of (a) providing transparency to data analyses and (b) flagging possible statistical errors, including misuse of significance testing.

One person commented to me that it may not be important for the package to include warnings about significance testing. I replied that on the contrary, such problems are by far the most common in all of statistics. Today I found an especially egregious case in point, not only because of the errors themselves but even more so because of the shockingly high mathematical sophistication of the culprits.

This fiasco occurs in the article, “Gravitational Waves and Their Mathematics” in the August 2017 issue of the Notices of the AMS, by mathematics and physics professors Lydia Bieri, David Garfinkle and Nicolás Yunes. In describing the results of a dramatic experiment claimed to show the existence of gravitational wages, the authors state,

…the aLIGO detectors recorded the interference pattern associated with a gravitational wave produced in the merger of two black holes 1.3 billion light years away. The signal was so loud (relative to the level of the noise) that the probability that the recorded event was a gravitational wave was much larger than 5𝜎, meaning that the probability of a false alarm was much smaller than 10-7.

Of course, in that second sentence, the second half is (or at least reads as) the all-too-common error of interpreting a p-value as the probability that the null hypothesis is correct. But that first half (probability of a gravitational wage was much larger than 5𝜎) is quite an “innovation” in the World of Statistical Errors. Actually, it may be a challenge to incorporate a warning for this kind of error in revisit. 🙂

Keep in mind that the authors of this article were NOT the ones who conducted the experiments, nor were they even in collaboration with the study team. But I have seen such things a number of times in physics, and it is reminiscent of some controversy over the confirmation of the existence of the Higgs Boson; I actually may disagree there, but it again shows that, at the least, physicists should stop treating statistics as not worth the effort needed for useful insight.

In that light, this in-depth analysis by the experiments looks well worth reading.