Snowdoop/partools Update

I’ve put together an updated version of my partools package, including Snowdoop, an alternative to MapReduce algorithms. You can download it here, version 1.0.1.

To review: The idea of Snowdoop is to create your own file chunking, rather than having something like Hadoop do it for you, and then using ordinary R coding to perform parallel operations. This avoids the need to deal with new constructs and complicated configuration issues with Hadoop and R interfaces to it.

Major changes are as follows:

There is a k-means clustering example of Snowdoop in the examples/ directory. Among other things, it illustrates the fact that with the Snowdoop approach, one automatically achieves a “caching” effect lacking in Hadoop, trivially by default.
There is a filesort() function, to sort a distributed file, keeping the result in memory in distributed form. I don’t know yet how efficient it will be relative to Hadoop.
There are various new short utility functions, such as filesplit().

Still not on Github yet, but Yihui should be happy that I converted the Snowdoop vignette to use knitr. 🙂

All of this is still preliminary, of course. It remains to be seen to what scale this approach will work well.

4 thoughts on “Snowdoop/partools Update”

Thanks! Okay, I downloaded the tar ball and opened it (which I normally would not do). The vignette was still using Sweave. If the package is on Github, I can send you a pull request in two minutes to fix this issue, and you can accept it in 10 seconds. If it is a tar ball, there will be more steps back and forth through emails, and my brain will start to hurt just by thinking of that 🙂

matloff says:

January 2, 2015 at 11:30 pm

Yeah, when I said knitr, I meant the general category of “R 纺织.” 🙂 I do recommend knitr to my students, but I wound up stopping short of using it here.

I agree with your analysis of the virtues of GitHub. If partools/Snowdoop grows to a larger, more complex state, it would definitely be worthwhile.

Reply
1. Yihui Xie says:
  
  January 7, 2015 at 3:42 pm
  
  哈哈，谢谢！
  
  Reply

不客气. R 纺织万岁! 🙂

	Anonymous on Just How Good Is ChatGPT in Da…
	Quantile Regression… on Quantile Regression with Rando…
	Anonymous on Quantile Regression with Rando…
	Sina Özdemir on qeML Example: Nonparametric Qu…
	Anonymous on qeML Example: Nonparametric Qu…

Mad (Data) Scientist

Snowdoop/partools Update

4 thoughts on “Snowdoop/partools Update”

Leave a comment Cancel reply

Musings, useful code etc. on R and data science

Share this:

Related

4 thoughts on “Snowdoop/partools Update”

Leave a comment Cancel reply

Musings, useful code etc. on R and data science