Scatterplot matrices with ggplot

If you’re constantly exploring data, chances are that you have already used the plot function pairs for producing a matrix of scatterplots. For instance, using the classic iris dataset we can obtain the following graphic

data(iris)
pairs(iris[,1:4])

You can add colors by simply typing

pairs(iris[,1:4], col=iris$Species)

If you’re a regular user of the package ggplot2, you might also have used the plotmatrix function which provides the following display

plotmatrix(iris[,1:4], colour="gray20")

Adding some regression lines we can get this

plotmatrix(iris[,1:4], colour="gray20") +
geom_smooth(method="lm")

Unfortunately, plotmatrix doesn’t come with a color argument to distinguish different points. For this reason I decided to tweak its code to get a customized version with added colors. Here’s the result:

I started by inspecting the code of plotmatrix to see how it worked. I realized I had to reshape the data in the right format in order to get the values of the density curves. For this purpose I created the function makePairs. Then I created a data frame called mega_iris which is the data used for the ggplot function. Here’s the code in R

# another option
makePairs <- function(data) 
{
  grid <- expand.grid(x = 1:ncol(data), y = 1:ncol(data))
  grid <- subset(grid, x != y)
  all <- do.call("rbind", lapply(1:nrow(grid), function(i) {
    xcol <- grid[i, "x"]
    ycol <- grid[i, "y"]
    data.frame(xvar = names(data)[ycol], yvar = names(data)[xcol], 
               x = data[, xcol], y = data[, ycol], data)
  }))
  all$xvar <- factor(all$xvar, levels = names(data))
  all$yvar <- factor(all$yvar, levels = names(data))
  densities <- do.call("rbind", lapply(1:ncol(data), function(i) {
    data.frame(xvar = names(data)[i], yvar = names(data)[i], x = data[, i])
  }))
  list(all=all, densities=densities)
}

# expand iris data frame for pairs plot
gg1 = makePairs(iris[,-5])

# new data frame mega iris
mega_iris = data.frame(gg1$all, Species=rep(iris$Species, length=nrow(gg1$all)))

# pairs plot
ggplot(mega_iris, aes_string(x = "x", y = "y")) + 
  facet_grid(xvar ~ yvar, scales = "free") + 
  geom_point(aes(colour=Species), na.rm = TRUE, alpha=0.8) + 
  stat_density(aes(x = x, y = ..scaled.. * diff(range(x)) + min(x)), 
               data = gg1$densities, position = "identity", 
               colour = "grey20", geom = "line")
Advertisements

2 thoughts on “Scatterplot matrices with ggplot

  1. Pingback: R: Plotting the distribution of variables in a dataset | The Whole Idea

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s