Got Plot? Clean Lines

Advices for plotting several data series with lines

Let’s suppose we have some data that we can represent with some lines (e.g. some time series)

# generate some fake data
set.seed(321)
y1 = sort(runif(10, 0, 1))
y2 = sort(runif(10, 0, 1))
y3 = sort(runif(10, 0, 1))
y4 = sort(runif(10, 0, 1))
y5 = sort(runif(10, 0, 1))
y6 = sort(runif(10, 0, 1))
some_data = rbind(y1, y2, y3, y4, y5, y6)

The Default approach in R

When we have data that can be represented with several lines, many people will automatically want to show them all in one single chart. The common approach in this situation is to combine dashed patterns and point shapes to differentiate the lines. This is a bad practice and it only creates confusion, making it difficult to distinguish the lines. Don’t your eyes hurt?

# common "default" plot
plot(0, xlab="x axis", ylab="y axis", xlim=c(1,10), ylim=c(0,1), type="n")
title("Don't do this, please")
for (i in 1:6)
{
  lines(1:10, some_data[i,], lty=i)
  points(1:10, some_data[i,], pch=i)
}

The false remedy approach

In order to bring some remedy, or some decoration, some people would use colors. This might not be a bad idea, bur the problem is that most users do not know how to choose colors wisely.

# common plot with colors
plot(0, xlab="x axis", ylab="y axis", xlim=c(1,10), ylim=c(0,1), type="n")
title("Don't do this either, please")
for (i in 1:6)
{
  lines(1:10, some_data[i,], lty=i, col=rainbow(6)[i])
  points(1:10, some_data[i,], pch=i, col=rainbow(6)[i])
}

One possible solution

The first advice is to limit the number of lines that appear in a single chart to no more than four, and use solid lines with sligthly different hues and/or widths. Keep in mind that the raison d’etre of a multiple-line chart is to compare and contrast different data sets. If we plot too many lines on the same chart then we defeat the entire objective.

# better plot (with three or four lines)
cols = paste("gray", seq(30,90,length=4), sep="")
# line widths
lwds = 1:4
# plot
plot(0, xlab="x axis", ylab="y axis", xlim=c(0.5,10), ylim=c(0,1),
     type="n", xaxs="i", yaxs="i", axes=FALSE)
axis(side=1, pos=0, at=0:10, lwd.ticks=0.2, col="gray70")
axis(side=2, pos=0.6, las=2, lwd.ticks=0.2, col="gray70")
title(c("A more reader friendly plot", "with three or four lines"), cex.main=0.9)
for (i in 1:4)
{
  lines(1:10, some_data[i,], col=cols[i], lwd=lwds[i])
}

Another possible solution

In case of having more than four lines, an alternative and effective solution consists of graphing the data with a panel of charts. This solution allows us to better appreciate potential patterns in data, as well as trends, outliers, and so on. With this approach we gain in clarity of comparison while maintaining each individual line.

# another good plot with all lines
par(mfrow = c(2,3), mar = c(3, 3, 3, 1))
# first row
for (i in 1:6)
{
  plot(0, xlab="", ylab="", xlim=c(1,10), ylim=c(0,1), type="n", axes=FALSE, xaxs="i", yaxs="i")
  #abline(v=1, h=0, col="gray80", lwd=2)
  axis(side=1, at=1:10, col="gray80",lwd.ticks=0.2, col.ticks="gray80", col.axis="gray80")
  axis(side=2, las=2, col="gray80", lwd.ticks=0.2, col.ticks="gray80", col.axis="gray80")
  title(paste("Line", i), cex.main=0.9)
  lines(1:10, some_data[i,], col="gray70", lwd=2.5)
}

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s