A Python-Like walk() Function for R

A really nice function available in Python is walk(), which recursively descends a directory tree, calling a user-supplied function in each directory within the tree. It might be used, say, to count the number of files, or maybe to remove all small files and so on. I had students in my undergraduate class write such a function in R for homework, and thought it may be interesting to present it here.

Among other things, readers who are not familiar with recursive function calls will learn about those here. I must add that all readers, even those with such background, will find this post to be rather subtle and requiring extra patience, but I believe you will find it worthwhile.

Let’s start with an example, in which we count the number of files with a given name:


countinst <- function(startdir,flname) {
   walk(startdir,checkname,
      list(flname=flname,tot=0))
}

checkname <- function(drname,filelist,arg) {
   if (arg$flname %in% filelist) 
      arg$tot <- arg$tot + 1
   arg
}

Say we try all this in a directory containing a subdirectory mydir that consists of a file x, and two subdirectories, d1 and d2. The latter in turn consists of another file named x. We then make the call countinst(‘mydir’,’x’).  As can be seen above, that call will in turn make the call


walk('mydir',checkname,list(flname='x',tot=0)

The walk() function, which I will present below, will start in mydir, and will call the user-specified function checkname() at every directory it encounters in the tree rooted at mydir, in this case mydir, d1 and d2.

At each such directory, walk() will pass to the user-specified function, in this case checkname(), first the current directory name, then a character vector reporting the names of all files (including directories) in that directory.  In mydir, for instance, this vector will be c(‘d1′,’d2′,’x’).  In addition, walk() will pass to checkname() an argument, formally named arg above, which will serve as a vehicle for accumulating running totals, in this case the total number of files of the given name.

So, let’s look at walk() itself:


walk <- function(currdir,f,arg) {
   # "leave trail of bread crumbs"
   savetop <- getwd()
   setwd(currdir)
   fls <- list.files()
   arg <- f(currdir,fls,arg)
   # subdirectories of this directory
   dirs <- list.dirs(recursive=FALSE)
   for (d in dirs) arg <- walk(d,f,arg)
   setwd(savetop) # go back to calling directory
   arg  
}    

The comments are hopefully self-explanatory, but the key point is that within walk(), there is another call to walk()! This is recursion.

Note how arg accumulates, as needed in this application. If on the other hand we wanted, say, simply to remove all files named ‘x’, we would just put in a dummy variable for arg. And though we didn’t need drname here, in some applications it would be useful.

For compute-intensive tasks, recursion is not very efficient in R, but it can be quite handy in certain settings.

If you would like to conveniently try the above example, here is some test code:


test <- function() {
   unlink('mydir',recursive=TRUE)
   dir.create('mydir')
   file.create('mydir/x')
   dir.create('mydir/d1')
   dir.create('mydir/d2')
   file.create('mydir/d2/x')
   print(countinst('mydir','x'))
}

 

Advertisements

4 thoughts on “A Python-Like walk() Function for R”

  1. NIce! I think it is good practice to restore the working directory with on.exit(), otherwise it will not be restored on error or interrupt.

    1. Right, I had that in the original version, and it really helped during debugging. But it does complicate things. If, say, the execution error occurs 3 levels down, the restored directory will be 2 levels down or something like that, not what we’d like. The original version I assigned to the students had an additional parameter firstcall to deal with that, making less simple.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s