My Presentation at useR! 2017, Etc.

I gave a talk titled, “Parallel Computation in R:  What We Want, and How We (Might) Get It,” at last week’s useR! 2017 conference in Brussels. You can view my slides here, and I think the conference organizers said the videos would be placed online, not sure of that though.

The goal of the talk was to propose general design patterns for parallel computation in R, meaning general approaches that should be useful in many applications. I emphasized that this was just one person’s opinion, and expected the Spark fans to disagree with my view that Spark is not a very useful tool for useRs. Actually, several speakers in other talks were negative about Spark as well. One gentleman did try to defend Spark during the Q&A, but he talked to me afterward, and turned out not to be a huge Spark fan after all, largely just playing the devil’s advocate.

My examples of course involved partools, the package I’ve been developing for parallel computation in R. (Duncan Temple Lang’s PhD student Clark Fitzgerald is now involved in developing the package as well.) However, I noted that the same general principles could be applied with some other packages, such as ddR and multidplyr.

There were of course a number of excellent talks, many more than I could attend. Among the ones I did attend, I would mention a few in particular:

  • A talk by Nick Ulle, another student of Duncan’s, about his project to bring the LLVM compiler world to R. This is a tough challenge, but Nick is making impressive progress.
  • A talk by Kylie Bemis, a post doc at Northeastern University, and her matter file system R package, which does distributed file allocation in a clever, general manner.
  • I did not get to see Jim Harner’s talk about his R IDE, rc2  but he demonstrated it for me on his laptop, very interesting.
  • Microsoft’s David Smith, one of the pioneers of the S/R world, gave an interesting “then and now” talk, listing questions that non-useRs would ask a few years ago when he suggested their switching to R — but which they no longer ask, demonstrating the huge increase in R usage in recent years, and its increase in power and usability.

My wife and I had fun exploring Brussels — one wrong decision in a subway station resulted in our ending up in front of the EU headquarters, an interesting error to make. And by an amazing stroke of good luck, the other summer conference at which I’ll be giving a talk, Small Area Estimation 2017, is to be held in Paris the very next week.

7 thoughts on “My Presentation at useR! 2017, Etc.”

  1. Even before your bullet points under “Not so simple”, there’s a crushing problem with doing parallel computing in R on GPUs – the dearth of high-quality open source system-level software (drivers and libraries).

    If you want to use a GPU, you have to use proprietary software, and even that is often of low quality and / or poorly documented. This is not an engineering problem; it’s a marketing one.

    The target market for GPUs is games, video editing and crypto-currencies, not low-cost high-performance statistical computing. And the market is dominated by one vendor – NVidia. Until that changes, there’s going to be precious little progress in GPU computing in R.

  2. Thanks for your presentation (i was at useR!), and it was really constructive with clear ans interesting comments, especially about the hype around hadoop/spark and the automatic parallelisation.

  3. Very interesting talk! A quote from your slides: “UseRs may have become fairly good programmers, but lack systems knowledge.”
    Do you have any recommendations about where one can obtain this knowledge? And is it worth the effort if you want to become a better programmer?

    P.s.: My first comment here, but I really like your blog!

    1. That question was asked in my talk. I answered by saying that I have an introduction to such things in my book, but that this is not enough.” One builds this knowledge in haphazard ways, e.g. by search terms like “cache miss” and “network latency” on the Web, and above all, by giving it careful thought and reasoning things out. (When Nobel laureate Richard Feynman was a kid, someone said in awe, “He fixes radios by thinking!”)

      Join an R Users Group, if there is one in your area. (And if not, then start one!) Talk about these things with them (though if you follow my above advice, you may find you soon know more than they do).

      Is it worth the effort? If you do computationally intensive work or have Big Data etc., all this will pay big dividends in faster code. But even if you don’t have the need, you will take pride in knowing in being a real master of R — absolutely its own reward, right? 🙂

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.