In this post we will talk about the R package “arcdiagram” for plotting pretty arc diagrams like the one below:

Les Miserables arc diagram (screenshot)
Arc Diagrams
An arc diagram is a graphical display to visualize graphs or networks in a one-dimensional layout. The main idea is to display nodes along a single axis, while representing the edges or connections between nodes with arcs. One of the disadvantages of arc diagrams is that they may not provide the overall structure of the network as effectively as a two-dimensional layout; however, with a good ordering of nodes, better visualizations can be achieved making it easy to identify clusters and bridges. Further, annotations and multivariate data can easily be displayed alongside nodes.
Some inspiration
I got hooked with arc diagrams the first time I saw the famous Similar Diversity graphic by Philipp Steinweber and Andreas Koller. I was so captivated with this diagram that I eventually made my own attempt to replicate it using the Star Wars movie scripts (see this post and these slides).
Arc Diagram: Les Misérables
Another really cool example of an arc diagram can be found in the examples’ gallery of Protovis (by Mike Bostock):
The diagram above is based on a network representation of character co-occurrence in the chapters of Victor Hugo’s classic novel Les Misérables. The original data set is from The Stanford GraphBase: A Platform for Combinatorial Computing (by Donald Knuth). The node colors indicate cluster memberships. You can find related files with the character co-occurrence network in Protovis and Gephi:
- Protovis: miserables.js (json format)
- Gephi: lesmiserables.gml (GML format)
Les Misérables Arc in R
The R package arcdiagram has been designed to help you plot pretty arc diagrams of graphs in R. You can think of it as a plugin of the package igraph (by Gabor Csardi and Tamas Nepusz). However, you could also make it work with network (by Carter Butts et al). arcdiagram lives in one of my github repositories; the complete documentation of the package as well as some basic examples are available at:
www.gastonsanchez.com/arcdiagram.
1) Installation
To install arcdiagram you will need to use the function install_github from the package devtools (by Hadley Wickham):
# install devtools
install.packages("devtools")
# load devtools
library(devtools)
# install arcdiagram
install_github('arcdiagram', username='gastonstat')
# load arcdiagram
library(arcdiagram)
2) Download the gml file ‘lesmiserables.txt’
After installing arcdiagram, the next step is to download the data file lesmiserables.txt that contains the graph in GML format. The file is available at www.gastonsanchez.com/lesmiserables.txt
In my case I downloaded the file in my directory: “/Users/gaston/lesmiserables.txt” (yours will be different). Once you have the graph file, you can import it in R with the function read.graph like so:
# location of 'gml' file mis_file = "/Users/gaston/lesmiserables.txt" # read 'gml' file mis_graph = read.graph(mis_file, format="gml")
3) Extracting graph attributes
The main function in arcdiagram is the arcplot function. This function requires an edgelist as its primary ingredient (an edge list is just a two column matrix that gives the list of edges for a graph). The rest of its arguments are a bunch of graphical parameters to play with.
Most of the information that we need to reproduce the arc diagram is already contained in the gml file as vertex and edge attributes. The trick is to extract the values with the functions get.vertex.attribute and get.edge.attribute:
# get edgelist edgelist = get.edgelist(mis_graph) # get vertex labels vlabels = get.vertex.attribute(mis_graph, "label") # get vertex groups vgroups = get.vertex.attribute(mis_graph, "group") # get vertex fill color vfill = get.vertex.attribute(mis_graph, "fill") # get vertex border color vborders = get.vertex.attribute(mis_graph, "border") # get vertex degree degrees = degree(mis_graph) # get edges value values = get.edge.attribute(mis_graph, "value")
4) Nodes ordering
We need to get the nodes ordering by using the package reshape (by Hadley Wickham). The idea is to create a data frame with the following variables: ‘vgroups’, ‘degrees’, ‘vlabels’, and a numeric index for the nodes ‘ind’. We will arrange the data frame in descending order, first by ‘vgroups’ and then by ‘degrees’; what we want is the sorted numeric index ‘ind’:
# load reshape library(reshape) # data frame with vgroups, degree, vlabels and ind x = data.frame(vgroups, degrees, vlabels, ind=1:vcount(mis_graph)) # arranging by vgroups and degrees y = arrange(x, desc(vgroups), desc(degrees)) # get ordering 'ind' new_ord = y$ind
5) Plot arc diagram
Now that we have all the elements for arcplot (edgelist, nodes ordering, graphical attributes), we are ready to plot the arc diagram. Here’s the code in R:
# plot arc diagram
arcplot(edgelist, ordering=new_ord, labels=vlabels, cex.labels=0.8,
show.nodes=TRUE, col.nodes=vborders, bg.nodes=vfill,
cex.nodes = log(degrees)+0.5, pch.nodes=21,
lwd.nodes = 2, line=-0.5,
col.arcs = hsv(0, 0, 0.2, 0.25), lwd.arcs = 1.5 * values)
Happy plotting!


One question: When is this going to become a CRAN package? Nice work, I can see many great uses for it. Thank you for sharing.
Have you heard of Voyant Tools? They have some really awesome, free to use online text tools that generate really interesting diagrams like these. I use these tools a lot in my text analysis, and I use this tool to style my network diagrams if they need a more polished look.
Thanks for the awesome post,
Greg
I’m interested in such visualization. I’m a beginner and I have one basic question, how to prepare the input file “lesmiserables.txt”? Do I need to do some work to prepare the input data?
Thanks.
Linda
Hi Linda, the file “lesmiserables.txt” is ready to be used (no need to touch it). Just follow the R code
I think that it would be nice to add it to the Programming wikibooks : http://en.wikibooks.org/wiki/R_Programming/Graphics#Graphics_gallery
Hello,
this looks really interesting! But, as a beginner, I would like to know how to prepare own data and convert it into a GML file or format. I found something on stackoverflow (http://stackoverflow.com/questions/12751497/how-to-convert-csv-file-containing-network-data-into-gml), but here my question would be: which “network-elements” should be in the columns of that csv-file? Or, for instance, taking your Star Wars example: How does the arcdiagramme-package now, whether “Luke” is a node or a tree-branch / arc?
Do you know what I mean?
Keep up the great work!
Best wishes
Daniel
Hi Daniel, the GML format is just one of many other formats for defining graphs; but the important part for ‘arcdiagram’ is the edgelist (a two-column matrix defining the edges between nodes). You can check the introductory documentation of the package at: http://www.gastonsanchez.com/arcdiagram
To know more about ‘igraph’ see: http://igraph.sourceforge.net/
Hi Gaston,
thanks, I really like you visualisation and did a number of tests with it when I faced the problem that the weight of a tie between two nodes is according to my impression not visualised correctly (low wight but thick line) even though the edgelist and the weight correspond well. Did you encounter this problem, too?
Thanks again,
Tobias
Hi Tobias, thanks for reporting the issue. I haven’t had this problem but I’ll take a look at it.
Hi Gaston,
this package is really interesting but I have a problem.
Suppose you have this
edgelist >
[,1] [,2]
[1,] 1 4
[2,] 2 4
[3,] 3 5
[4,] 3 6
and these
vlabels>
[1] “a” “b” “c” “d” “e” “f”
If I run
arcplot(edgelist, labels=vlabels)
why do I obtain these edges: a – b, b – c, d – e, d – f
instead of a – d, b – d, c – e, c – f?
Where is my mistake?
Thank you a lot,
Dani
Hi Daniel,
Here is the solution:
edgelist = rbind(
c(1, 4),
c(2, 4),
c(3, 5),
c(3, 6))
vlabels = c(“a”, “b”, “c”, “d”, “e”, “f”)
# get x-axis coordinates of nodes
node_coords = xynodes(edgelist)
# nodes are plotted in this order
aux_index = as.numeric(names(node_coords))
# specify labels accordingly
arcplot(edgelist, labels=vlabels[aux_index])
I know it’s kind of weird and unintuitive (I’ll need to make some changes).
Thank you!
Yes, it’s not too much intuitive and I think that the best solution is to create the edge matrix with the names of the nodes
edgelist = rbind(
+ c(“a”, “d”),
+ c(“b”, “d”),
+ c(“c”, “e”),
+ c(“c”, “f”))
and then simply
arcplot(edgelist, labels=NULL, sorted=TRUE)
..with sorted=TRUE if you want to order the nodes.
Bye,
Dani
I have R v2.13.2, and 2.15.3, and arcdisplay is not available for either version.
I’ve tried to use 2.11.0 also, and it said arcdisplay could not be found…??????
Hi Tom,
The R package arcdiagram should be installed from my github repository. You can find the necessary instructions at https://www.gastonsanchez.com/arcdiagram
Thank you for your work and help, Gaston.