Adapting the Co-Occurrence Matrix Display

Duncan Temple Lang

University of California at Davis

Department of Statistics


Background & Goals

We explored in R how we can use a rectangular/square matrix display to present the adjacency matrix. We looked at the co-sponsorship graph for the current senate. We also looked at the last 15 senates, and we also have the same information for the House. In these plots, the order of the rows & columns is important to display different features. We can animate the change of rows and columns and we can also make the display interactive. This discusses how we took the original example of the co-occurrences of characters in Les Miserables from Mike Bostock at http://bost.ocks.org/mike/miserables/ and added features to it.

The following are the features we might like to add:

  • Hyperlinks to the people's names

  • link the hyperlinks to the govtrack_id's

  • color the names by party

  • tooltips for the rectangles in the matrix to give counts

  • click on a rectangle in the matrix to get a list of the co-sponsored bills

  • click on the rectangle and higlight all the senators that the person co-sponsored a bill with.

Creating the Initial Plot

Firstly, we started with the adjacency matrix. We computed this with the code in senate1.R and funs.R. The raw data are in Congress113.zip and data from the other congress are available from http://govtrack.us or me. The simplest thing is to load the RDA file with the 30 matrices (15 for each of the House and Senate).

The original and annotaed HTML file from Mike Bostock's example is orginalMatrix.html. It is shown below. I've removed the text and other less important details only to simplify understanding the document.

The data for the matrix is loaded via the call

d3.json("miserables.json", function(miserables) {
     .....
 })

The body of this function definition makes up most of the code to create the plot and indeed the HTML document.

We change the name of the file that contains the data for our network to senateMatrix.json and we create this in R from our matrix representing the network. We create this with

cat(jsonlite::toJSON( list(nodes = Nodes, links = Links)), file = "senateMatrix.json")

or

cat(RJSONIO::toJSON( list(nodes = Nodes, links = Links)), file = "senateMatrix.json", byrow = TRUE)

But before this, we have to create the variable Nodes and Links. We do this with the code in senate1.R. This looks lengthy, yet we used this same data for numerous plots. We start by loading the data and obtaining the matrix for the current senate:

print(load("15Sessions.rda"))
sen = ms[[length(ms)]]  # the final/current congress

To simplify our network and focus on the most important elements within it, we only consider links where there are 31 or more co-sponsored bills between a pair of senators as constituting a link:

Threshold = 30
m = sen > Threshold

This creates a binary (0/1) matrix.

Our Links variable is a data frame consisting of the source, target pairs for our edges and these should be indices into the set of nodes. To do this, we first turn our adjacency matrix into an edge list using igraph:

library(igraph)
el = as_edgelist(graph.adjacency(m))
el = as.data.frame(el, stringsAsFactors = FALSE)

The first few rows of these are

head(el)
      V1     V2
1 412494 412493
2 412494 412251
3 412494 400034
4 412494 400040
5 412494 400061
6 412494 300023

These strings are the govtrack identifiers for the senators. We can look these up in the people data frame.

We have to match these source and target pairs to the nodes. We can do this with

ids = colnames(m)
Links = data.frame(source = match(el[,1], ids) - 1L,
                   target = match(el[,2], ids) - 1L)

We have subtracted 1 from each of the these indices since these are for JavaScript and they should be 0-based.

In addition to just the source and target for each link, we want information about the link. We'll add the number of bills (co-)sponsored by these two senators together. This comes from the original non-thresholded matrix sen. We can get these with

counts = sen[ as.matrix(el) ]
Links$value = as.integer(cut(counts, quantile(counts, c(0, .2, .4, .6, .8, 1))))

This puts the counts into different groups/categories based on the distribution, specifically quintiles, of these counts.

We next need the Nodes data frame. For each node, we want the name, group (with which party the senator is affiliated), and also the number of bills s/he has (co-)sponsored. We obtain this with

i = match(ids, people$govtrack_id)
Nodes = data.frame(name = paste(people[i, "first_name"], people[i, "last_name"]),
                   group = as.integer(factor(people[i,"party"])),
                   size = rowSums(m[ids, ids]))

To create our JSON (JavaScript Object Notation) file, we put these two variables together in a list and output it as JSON (JavaScript Object Notation) with, e.g.,

cat(jsonlite::toJSON( list(nodes = Nodes, links = Links)), file = "senateMatrix.json")

Now we have the data in the same format that the JavaScript code expects. And we should note that this is the format for numerous plots, including the networkD3 package and its forceNetwork() function. So we would have to do this for most of the plots we want to create, and we can use it across several different renderings without changing the JSON (JavaScript Object Notation) content.

Understanding the HTML Document

An annotated version of the original document is here

<!DOCTYPE html>
<html>
<meta charset="utf-8">
<script src="//d3js.org/d3.v2.min.js" charset="utf-8"></script>

<aside style="margin-top:80px;">
<p>Order: <select id="order">
  <option value="name">by Name</option>
  <option value="count">by Frequency</option>
  <option value="group">by Cluster</option>
</select>
</aside>

<script>
var margin = {top: 80, right: 0, bottom: 10, left: 80},
    width = 720,
    height = 720;

var x = d3.scale.ordinal().rangeBands([0, width]),
    z = d3.scale.linear().domain([0, 4]).clamp(true),
    c = d3.scale.category10().domain(d3.range(10));

/* Create the SVG drawing area (<svg>) and position based on the values in margin.
   We add a <g> element as the sole child of <svg> element. This is where
   we will add the text for the row and column labels and the rows. */
var svg = d3.select("body").append("svg")
    .attr("width", width + margin.left + margin.right)
    .attr("height", height + margin.top + margin.bottom)
    .style("margin-left", -margin.left + "px")
    .append("g")
    .attr("transform", "translate(" + margin.left + "," + margin.top + ")");


/* Read the data for the nodes and edges and create the matrix display.
   The function passed as the second argument is called after the
   data has been (downloaded if necessary) and read into memory.
   As such, it is ansynchronous call. This means
   other code within this <script> element cannot use variables
   or objects it creates because the code in the function
   won't have been run yet.
*/

d3.json("miserables.json", function(miserables) {
  var matrix = [],
      nodes = miserables.nodes,
      n = nodes.length;

  // Compute index per node.
  nodes.forEach(function(node, i) {
    node.index = i;
    node.count = 0;
    matrix[i] = d3.range(n).map(function(j) { return {x: j, y: i, z: 0}; });
  });

  // Convert links to matrix; count character occurrences.
  miserables.links.forEach(function(link) {
    matrix[link.source][link.target].z += link.value;
    matrix[link.target][link.source].z += link.value;
    matrix[link.source][link.source].z += link.value;
    matrix[link.target][link.target].z += link.value;
    nodes[link.source].count += link.value;
    nodes[link.target].count += link.value;
  });

  // Precompute the orders.
  // These are for the ordering by Frequency, by Name, by Cluster
  var orders = {
    name: d3.range(n).sort(function(a, b) { return d3.ascending(nodes[a].name, nodes[b].name); }),
    count: d3.range(n).sort(function(a, b) { return nodes[b].count - nodes[a].count; }),
    group: d3.range(n).sort(function(a, b) { return nodes[b].group - nodes[a].group; })
  };

  // The default sort order.
  x.domain(orders.name);

    // Create the background rectangle thhat covers the entire <svg> element.
  svg.append("rect")
      .attr("class", "background")
      .attr("width", width)
      .attr("height", height);

    /*  now we are ready to create the rows of the matrix
        We select the .row objects (by class) which don't exist yet, and then bind the data to
        and iterate to create a <g> and the sequence of <rect> elements
        and then the <text> for the row label
        The looping here happens after the call to enter(), but the majority of the work
        is done in the function named row (not to be confused with the variable named row
        that we create here.
        Note that the data we bind is the matrix.  This is an array
        of rows. Each row is also an array.
        Note that we create the g for housing the <rect>, <text> and <line>
        elements. By positioning it (via a transform attribue), the children
        can be specified relative to <g> and hence not transformed themselves. */
  var row = svg.selectAll(".row")
      .data(matrix)
      .enter()
      .append("g")
      .attr("class", "row")
      .attr("transform", function(d, i) { return "translate(0," + x(i) + ")"; })
      .each(row);  // So this calls the row function below for each of the rows in the matrix.

     // Draw a line under all the <rect> elements, i.e. separating this <g> from the next row's <g>
  row.append("line")
      .attr("x2", width);

     /* Put the row names on the left of the row. These will be the last elements in the <g>
        for the row, but we specify the x location so it the position in the tree is not important. */
  row.append("text")
      .attr("x", -6)
      .attr("y", x.rangeBand() / 2)
      .attr("dy", ".32em")
      .attr("text-anchor", "end")
      .text(function(d, i) { return nodes[i].name; });


       /*  This is the row() function that we arranged to call to build the <rect> for
           each cell in this row of the matrix.
           The function starts by selecting the newly created DOM element, namely the <g> for this
           row. Then it creates the selection for the soon to be created cells and iterates over
           them  and the data to create the <rect> elements. What are the data here?
           It is a row of the matrix
         */
  function row(row) {
    var cell = d3.select(this).selectAll(".cell")
        .data()                    /* row.filter(function(d) { return d.z; }) */
        .enter()
        .append("rect")
        .attr("class", "cell")
        .attr("x", function(d) { return x(d.x); })
        .attr("width", x.rangeBand())   // this is the x created at the top of the script, (not the
                                        // x attribute created in the line above.
        .attr("height", x.rangeBand())
        .style("fill-opacity", function(d) { return z(d.z); })
             // set the color based on the row's group
        .style("fill", function(d) { return nodes[d.x].group == nodes[d.y].group ?  c(nodes[d.x].group) : null; })
             // set event handlers
        .on("mouseover", mouseover)  
        .on("mouseout", mouseout);
  }


      /* Next create the <g> elements for the columns. These will span the entire height
         of the matrix. This allows us to have events on the row and column to highlight
         the names of the two senators when mouse over a cell.
         The next three statements work with the column selection and create the <g>
         then then <line> separating the columns and then the <text> for the column
         label.
        */
  var column = svg.selectAll(".column")
      .data(matrix)
      .enter()
      .append("g")
      .attr("class", "column")
      .attr("transform", function(d, i) { return "translate(" + x(i) + ")rotate(-90)"; });

  column.append("line")
      .attr("x1", -width);

  column.append("text")
      .attr("x", 6)
      .attr("y", x.rangeBand() / 2)
      .attr("dy", ".32em")
      .attr("text-anchor", "start")
      .text(function(d, i) { return nodes[i].name; });


  function mouseover(p) {
    d3.selectAll(".row text").classed("active", function(d, i) { return i == p.y; });
    d3.selectAll(".column text").classed("active", function(d, i) { return i == p.x; });
  }

  function mouseout() {
    d3.selectAll("text").classed("active", false);
  }


    /* Set the event handler for the HTML pull-down menu.*/

  d3.select("#order").on("change", function() {
    clearTimeout(timeout);
    order(this.value);  // call this with the newly selected value from the pulldown menu.
  });


     /* This arranges to reorder the rows and columns to  show the new display. */
  function order(value) {
    x.domain(orders[value]);

        // create an animation/transition to show the rows and columns moving.
    var t = svg.transition().duration(2500);

       // move the rows and columns simultaneously.

    t.selectAll(".row")
        .delay(function(d, i) { return x(i) * 4; })
        .attr("transform", function(d, i) { return "translate(0," + x(i) + ")"; })
        .selectAll(".cell")
        .delay(function(d) { return x(d.x) * 4; })
        .attr("x", function(d) { return x(d.x); });

    t.selectAll(".column")
        .delay(function(d, i) { return x(i) * 4; })
        .attr("transform", function(d, i) { return "translate(" + x(i) + ")rotate(-90)"; });
  }

    /* This arranges to change to the by Cluster view after 5 seconds from when the page is displayed.*/
  var timeout = setTimeout(function() {
                             order("group");
                             d3.select("#order").property("selectedIndex", 2).node().focus();
                           }, 5000);
});
</script>





Firstly, there is a pull-down menu with three options: by Name, by Frequency and by Cluster. When the viewer changes this, the matrix reorganizes itself. This involves having the HTML content change the plot. So there is some interaction between two quite different elements of the overall page - the text and the plot.

There is no explicit event handler for this. However, near the bottom of the in-lined JavaScript code we have

 d3.select("#order").on("change", function() {
   clearTimeout(timeout);
   order(this.value);
 });

This queries the <select> element by its id attribute and adds an event handler for any changes made to it. The handler calls the order function.

We don't have to change this code for the pull-down menu as it continues to do exactly what we want with our new data. It is independent of the actual data content.

Adding Hyperlinks to the People's Names

To add hyperlinks to the row labels (i.e. the names of the senators appearing for each row), we can adapt the JavaScript code

  row.append("text")
      .attr("x", -6)
      .attr("y", x.rangeBand() / 2)
      .attr("dy", ".32em")
      .attr("text-anchor", "end")
      .text(function(d, i) { return nodes[i].name; });

Instead of <text> elements, we want to create <a> elements for a hyperlink and within this have the <text> element. For the <a> element, we add a href attribute to specify the destination of the hyperlink. For SVG we actually use xlink:href and not the simple HTML href attribute. We can do this with the following change to the code above

   row.append("a")
     .attr("xlink:href", function(d, i) { return nodes[i].name; })
      .append("text")
      .attr("x", -6)
      .attr("y", x.rangeBand() / 2)
      .attr("dy", ".32em")
      .attr("text-anchor", "end")
      .text(function(d, i) { return nodes[i].name; });

This makes the text clickable. Unfortunately, the link is wrong. The person's name does not bring us to a proper Web page. Instead, we need to map this to a page on govtrack.us for example. Here, there is a page for each senator using the template URL

http://govtrack.us/congress/members/firstName_lastName/govtrack_id

The govtrack_id We can have the JavaScript code create these URLs dynamically, or alternatively we could create them in R and store them in each node. (Or we can do both!) For the former, we need the govtrack_id for each node. So when we create the data frames in R to export to JSON (JavaScript Object Notation), we add this as a column in the Nodes data frame. This is as simple as

Nodes$govtrack_id = people$govtrack_id[i]

since i was the vector of indices matching these 100 senators. Similarly, we could add the URLs with

Nodes$url = sprintf("http://govtrack.us/congress/members/%s_%s/%s", 
                                   tolower(people$first_name[i]), 
                                   tolower(people$last_name[i]),
                                   people$govtrack_id[i])

We change our JavaScript code above to

   row.append("a")
     <highlight>.attr("xlink:href", function(d, i) { return makeGovLinkURL(nodes[i]) })</highlight>
     .append("text")

The function makeGovLinkURL creates the URL, taking care to convert the name to lower case and replacing a space with a _. This code is in senateMatrixLinksGovtrackColoredNames.html and here:

/* Create the GovTrack.us link for this member of congress. */
function makeGovLinkURL(node)
{
   return "http://govtrack.us/congress/members/" +  mkName(node.name) + "/" + node.govtrack_id;
}

function mkName(name)
{
  return( name.replace(" ", "_").toLowerCase() ) ;
}

We do the same for the column labels, changing the code

column.append("text")
      .attr("x", 6)

to

column.append("a")
     .attr("xlink:href", function(d, i) { return makeGovLinkURL(nodes[i]) })  
      .append("text")

Changing the Colors of the Row and Column Labels

The goal here is to use the party affiliation to color senators names. We can do this in various ways. We'll chose to use CSS. The names are within <text> elements. When we create these elements, we'll set the class to be Democrat, Republican or Independent. We do this with the following one expression addition (highlighted)

  row.append("a")
      .attr("xlink:href", function(d, i) { return makeGovLinkURL(nodes[i]) })
      .append("text")
      .attr("x", -6)
      .attr("y", x.rangeBand() / 2)
      .attr("dy", ".32em")
      .attr("text-anchor", "end")
     .attr("class", function(d, i) { return nodes[i].party;  } )
      .text(function(d, i) { return nodes[i].name; });

This extracts the party field for each node and uses that as the name of the class. We didn't yet add that to each node in our JSON (JavaScript Object Notation) data, but we do so easily in R with

Nodes$party = people[i, "party"]

and the export Nodes and Links to JSON (JavaScript Object Notation) as we did above.

Setting the class does not change the color of the labels. We also have to add directives for this which we do in CSS in the HTML document with

text.Democrat {
   fill: blue;
}

text.Republican {
   fill: red;
}

text.Independent {
   fill: green;
}

We put these in the <style> element in the <head> of the <html> document.

The results are visible in senateMatrixLinksGovtrackColoredNames.html.

Adding Tooltips to the <rect> Elements

It would be convenient to be able to mouse over a cell in the matrix and hover long enough so that a tooltip appears telling us the names of the two senators for that cell (the row and column labels) and the number of co-sponsored bills they appeared on.

Creating tooltips is different for different browsers. For Firefox, we can just add a title attribute to an element. For Google Chrome, we have to add a <title> element as a child of the element, e.g., of each <rect>.

function row(row, rowNum) {
   var cell = d3.select(this)
                .selectAll(".cell")
                .data()
                .enter()
                .append("rect")
                .attr("class", "cell")
                   ....
              .each(function(d, i) { return addTooltip(d3.select(this), i, rowNum) ; });

This just calls each with a function that takes both the data and the column number. We modified the row function to also receive the row number which we need to identify the row and column for the cell and hence look up both labels and the count. We call the addTooltip we also added to create the <title> element with the relevant text for the tooltip. It computes the latter via the row and column numbers. However, that function also needs to know to which DOM element to append the new <title> element. So we pass it the newly created <rect> element. We get this via the call

d3.select(this)

See Mastering D3.js p25.

The addTooltip function is defined as

function addTooltip(obj, i, j) {
   obj.append("title").text( nodes[j].name + " " + nodes[i].name + ": " + matrix[j][i].z);
   return obj;
}

The original matrix display by Mike Bostock had a filter for the cells in a row of the matrix. The code for the row function started with

function row(row) {
    var cell = d3.select(this).selectAll(".cell")
        .data(row.filter(function(d) { return d.z; }))
        .enter()
         ...

This was sensible as it does not create a <rect> for any cell in the matrix with a count of 0. Unfortunately, this means that the column numbers do not correspond to the array of senators. For example, if the second cell of a row in the matrix is 0, then that cell is discarded. The first rectangle created corresponds to the first senator, but the second rectangle corresponds to the third senator. To avoid this and ensure that we can map to the relevant senator, we do not filter any of the 0-count cells. Alternatively, we could put the index into the node itself so we could look at it (in the variable d) and get the index of the corresponding senator.

Clicking on a Cell

To add the ability to click on a cell in the matrix and have an action occur, we need to add an event handler for the click event for each rectangle. We'll need the row and column indices for the cell. This is basically what we did for the tooltip, but simpler as we are not adding a new element. So we can add this to our row function with the following line

  function row(row, rowNum) {
    var cell = d3.select(this).selectAll(".cell")
        .data()
        .enter()
        .append("rect")
         ....
        .on("mouseover", mouseover)
        .on("mouseout", mouseout)
       .on("click",  function(val, i) { return clickCell(val, i, rowNum) })
        .each(function(d, i) { return addTooltip(d3.select(this), i, rowNum) ; });

This adds the event handler using the on method. The function takes the value of the data object (our matrix cell) and the column number since that is what we are looping over here. We also get the row number from the row function and then we pass these to our actual event handler function which we call clickCell. This can do whatever it wants such as adding content to another part of the HTML document.