Content Groups and Network Graphs

Posted on by

Google Analytic’s content groups is certainly one of the tools I find interesting. Setting them up can be interesting, though using the tracking code approach combined with GTM can greatly simplify matters. There is a lot that can be written about planning content groups, executing on them and then usng the data available via Google Analytic’s API to produce meaningful analysis, and this subject is certainly worth a few blog posts. For now though, lets look at visualising the data.

Visualising networks can be very interesting and in some ways challenging. There are a lot of good examples of different approaches to this problem, including a lot of material from the d3.js gallery and any number of R packages (like qgraph) and other tools. Suffice to say, there has always been interest in connecting nodes via edges in interesting and informative ways.

Selecting the best way to explain the relationship between different things based on some form of value metric can be problematic. Especially with detailed data.

Site Sections to Site Sections

The following examples are based on content groups for another site. The dimensions used are previousContentGroup and nextContentGroup1 with a log transformed pageview total as the value metric. Unfortunately the site I based this on has not really been updated frequently and as a result has very little traffic. One of the most important things to keep in mind with this data is that it is bi-directional. The graph needs to differentiate between both kinds of connection, which rules out a few possibilities, such as the following created using a now defunct web service called impure.com.

A network of tags and posts

One method of dealing with this complexity is to use interactive visualisations, the javascript library d3.js being one of the better known examples of this approach.

Force Node Layout

Quick node graph created in R

Creating the above graph is very easy in R. The code used to create the above graph (minus API access details) can be found on github here. The edges are weighted by the volume of traffic moving from one area to another. In this example the exact number has been transformed to minimise the absolute difference between the edge wides.

The force node graph above was based on material and examples from D3.js Tips and Tricks. This graph was created based on the same data as the one above. One of the main differences between the two is the difference in how each graph deals with linking the same node. The R example can handle that by default, the d3.js example would need further work to account for this.

Chord Graph

The above graph is based on examples like the Uber Rides by Neighborhood and Chord Diagram examples. This was also created using the same data set, though in this case it was converted to a matrix rather than used as a simple CSV listing source, target and values by row. The same transformations were applied to pageviews as seen in the force node graph.

Unlike the others, this graph makes it far easier to drill down to a single group, and in the other more detailed examples linked to above, this makes it very easy to focus on the areas of interest in a more crowded visualisation.

The point

While data visualisation can be interesting, what has not yet been addressed here is what value they have, what decisions they make possible or insights they provide. The value that you can get from visualisations depends on everything that happened before. In the case of the content group examples above, it would start with why the pages were assigned to the groups they were and why those groups exist and what they are supposed to represent through to how the data is collected, collated and finally processed.

Leave a Reply

Your email address will not be published. Required fields are marked *