The freqparcoord Package for Multivariate Visualization

Recently my student Yingkang Xie and I have developed freqparcoord, a novel approach to the parallel coordinates method for multivariate data visualization. Our approach:

Addresses the screen-clutter problem in parallel coordinates, by only plotting the “most typical” cases, meaning those with the highest estimated multivariate density values. This makes it easier to discern relations between variables.
Also allows plotting the “least typical” cases, i.e. those with the lowest density values, in order to find outliers.
Allows plotting only cases that are “local maxima” in terms of density, as a means of performing clustering.

The user has the option of specifying that the computation be done parallelized. (See http://heather.cs.ucdavis.edu/paralleldatasci.pdf for a partial draft of my book, Parallel Computing for Data Science: with Examples from R and Beyond, to be published by Chapman & Hall later this year. Comments welcome.) For a quick intro to freqparcoord, download from CRAN, and load into R. Type ?freqparcoord and run the examples, making sure to read the comments. One of the examples, whose plot is shown below, involves baseball player data, courtesy of the UCLA Statistics Dept. Here we’ve plotted the 5 most typical lines for each position. We see that catchers tend to be shorter, heavier and older, while pitchers tend to be taller, lighter and younger.

7 thoughts on “The freqparcoord Package for Multivariate Visualization”

Pingback: More on freqparcoord | Mad (Data) Scientist

Norm:

Great R package!
Is there someway to optimize the ordering of the variables
in the parallel lines plot?

–Larry

matloff says:

April 10, 2014 at 10:39 pm

Glad you like it, Larry. What some parallel coordinates packages do is to allow the user to interactively reorder the vertical axes. I believe we’ll be able to do that without recomputation, by utilizing the structure of the ggplot2 object. Good suggestion for the next version!

Reply

Is the next to the last line of code in the brackets in the knndens function redundant?

function (data, k)
{
dsts <- get.knn(data, k = k)$nn.dist
hvec <- dsts[, k]
if (any(hvec == 0))
stop("duplicate data points, try larger k or jitter()")
1/(hvec^ncol(data)) #"Volume" around x #Redundant?
(k/nrow(data))/(hvec^ncol(data)) #(k/N)/(Volume around x)
}

Pingback: Curated list of R tutorials for Data Science – the data science blog

Pingback: Curated list of R tutorials for Data Science - Meetkumar

Pingback: R Data Science Tutorials - 算法网

Pingback: More on freqparcoord | Mad (Data) Scientist
normaldeviate says:

April 10, 2014 at 6:28 am

Norm:

Great R package!
Is there someway to optimize the ordering of the variables
in the parallel lines plot?

–Larry

1. matloff says:
  
  April 10, 2014 at 10:39 pm
  
  Glad you like it, Larry. What some parallel coordinates packages do is to allow the user to interactively reorder the vertical axes. I believe we’ll be able to do that without recomputation, by utilizing the structure of the ggplot2 object. Good suggestion for the next version!
  
Roy Robertson says:

April 23, 2014 at 10:55 am

Is the next to the last line of code in the brackets in the knndens function redundant?

function (data, k)
{
dsts <- get.knn(data, k = k)$nn.dist
hvec <- dsts[, k]
if (any(hvec == 0))
stop("duplicate data points, try larger k or jitter()")
1/(hvec^ncol(data)) #"Volume" around x #Redundant?
(k/nrow(data))/(hvec^ncol(data)) #(k/N)/(Volume around x)
}

Pingback: Curated list of R tutorials for Data Science – the data science blog
Pingback: Curated list of R tutorials for Data Science - Meetkumar
Pingback: R Data Science Tutorials - 算法网

	Anonymous on Just How Good Is ChatGPT in Da…
	Quantile Regression… on Quantile Regression with Rando…
	Anonymous on Quantile Regression with Rando…
	Sina Özdemir on qeML Example: Nonparametric Qu…
	Anonymous on qeML Example: Nonparametric Qu…

Mad (Data) Scientist

The freqparcoord Package for Multivariate Visualization

7 thoughts on “The freqparcoord Package for Multivariate Visualization”

Leave a reply to normaldeviate Cancel reply

Musings, useful code etc. on R and data science

Share this:

Related

7 thoughts on “The freqparcoord Package for Multivariate Visualization”

Leave a reply to normaldeviate Cancel reply

Musings, useful code etc. on R and data science