orbifold think. visualize. understand.

Diagram analysis, large trees and pretty pictures.

Diagrams and graphs are useful to visualize data. If the data remains below a certain degree it's easy to manipulate yourself the nodes but what about thousands of nodes? What if you wish to navigate large amounts of data, filter things out or traverse a tree? Welcome to graph analysis, algorithmic layout and mutli-threaded processing of large data blobs!


Layout and graph analysis is a world on its own. Many research institutes around the world dedicate their time to elaborate complex algorithms and, indeed, to answer the question: what makes a layout of a diagram appealing or useful? The answer depends of course on your data and what you're looking for; relationships, clustering phenomena, trends...? So, there is in general no unique solution and even if you do find a useful layout algorithm there are many paremeters to tweak in function of the data.


Layout algorithms make heavily use of mathematical topics; spanning tree, cyclic graphs, Hamiltonian property, coloring...and so on. Graph analysis is all about abstracting away the details of a diagram to deduce certain properties and infer others. For example, when is a diagram tree-like? Xml data is obviously tree-like but can you lay out a generic diagram in a tree-like fashion? How do you traverse a diagram without visiting a node twice? This so-called cyclicity of a diagram is directly related to infinte loops in C# or, if you prefer, the (im)possibility to implement the IEnumerable interface on a traversal of the data.


In previous versions of the Netron library you could, with various levels of success, use a few layout algorithms and the graph analysis library evolved from a separate assembly to an integrate sub-library. It's always been, however, rather a difficult marriage between typical flowcharting features (properties, grouping, online editing...) and features related to data visualization (layout, filtering...). In a way, these are two extremes; if you are interested in large visualizations you don't care about shape-level features and if you are interested in shape-level features (data simulations, pretty shapes, sub-shape elements) you don't care about graph analysis. I beleive the solution is to have a diagramming control which suits your needs and in this respect the Netron project is still much more a 'toolbox' or 'how you could do it' rather than an 'all in one solution'. On top of that there are other features like the internal data-flow mechanism which represents yet another pole of attraction is the code. Unfortunately I cannot maintain multiple diagramming libraries at the same time and the best compromise I have found is to make the library modular by means of interfaces, abstract base classes and partial classes. So far for the timid annoucement of an otherwise magnificent addition.


In the past months I have gone through much efforts to upgrade and improve the old graph analysis library, add new layout algorithms and test various approaches. Generics has been always the missing link in this context, iterators a much needed feature and I'm glad to say that things are way more easier now than before. Among other things:


  • the analysis code is now integrated in the (MVC) model but somewhat hidden by means of an explicit interface implementation
  • like everything else in the new library, graph analysis is interface based
  • layout is multi-threaded, based on the BackgroundWorker class
  • the spanning tree calculation lies at the hearth of many layout algorithms and has been simplified
I cannot avoit at this point to mention the Prefuse project from which I have benfited a lot while exploring layout.


Before showing off some things I'd like to give a message of warning. As I said before, layout algorithms are the playground of research teams and the more you venture into it, the more you'll find topics like partial differential equations, discretization of sets of equations, constraint and boundary value techniques, Runge-Kutta integration, the N-body problem and so on. The underlying idea is that embedding physics (springs, gravitation, wall forces...) in a diagram leads to symmetry, natural motions and (hence) layout designs looking like (real) trees, flowers and symmetrical shapes. It is then no accident that in the pictures below you'll recognize something of the percolating rain on a window, the cracked surface of glass or the clustering balloon-like bubbles of a big molecule. There is also a surprising overlap here with the field of games and artificial intelligence. The integration techniques (Euler integration, Barnes-Hut...) have indeed much resemblance with the particle effects or physical simulations in 3D gaming environments. Anyway, if you have some background in maths and physics, there is much to be learned and the edge between the diagrammatic world and the realtime world of games is absent.


Alright, the fun part now. Below are some screenshots of recent experiments with the various layout algorithms. My development machine is a P4 3GHz with hyperthreading and 1Gb or RAM. The layout took between ten seconds and two minutes. The amount of nodes (shapes) was between 3.000 and 20.000 (!). The time is varying between an O(x) and O(x log(x)) complexity, different algorithms different significantly in performance. In some cases I deliberately choose to suppres the painting of the shapes to contrast the layout since the overlapping shapes often obscure the edges on a large scale. The scaling is between 10% and 0.01% of the original painting. Finally, I designed a more-or-less smart random-tree generator to produce a balanced tree graph in each case. The layout algorithms all function with non-tree data but in a benchmark like this it would only blur the actual intricaties of the layout.


The balloon layout. This layout places children nodes radially around their parents, and is equivalent to a top-down flattened view of a cone-tree. It has something of a tree-branch in spring-time.


Balloon layout


The force-directed layout. This is a force-driven algorithm where the shapes and connections are subjected to different forces like repulsion, gravitation, spring forces. In its simplest form it amounts to an embedding of springs in the connections but the code is modular and you can integrate virtually any kind of force in the model (i.e. any kind of force potential in the discretized differential equation).


Force directed layout


Fruchterman-Reingold layout. This layout algorithm was already present in previous Netron libraries but has been rewritten and integrated, producing an overall much more satisfactory result.


Fruchterman-Reingold layout


Radial-tree layout. This layout computes a radial layout, laying out subsequent depth levels of a tree on circles of progressively increasing radius. The effect is similar to the balloon layout except that the subsequent levels are layed out as shells around a common center rather than around the center of the parent node.


Radial tree layout


Classic tree layout. This layout produces the classic hierarchy of a tree. The code is different than the one you can find in my CodeProject article (i.e. Lithium library) and is very fast; many thousands of nodes are layed out in seconds. In addition, you can choose the layout direction with just an enum switch.


Classic tree layout


All this code will be available somewhere in the near future and I intend to release yet another preview before the final, so stay tuned. Not sure if the data-flow code will the next addition; anybody interested?


People who have signed in for the v2.5 preview will receive an Email when the v2.8 is ready for download. Meanwhile you can still sign in here. Note that the registration is based on you Email address; if you sign-in with your (previous registered) Email address (whatever the first name and last name is) the system will recognize you and allow you download or re-download the code. I have however noticed some problems with certain domains; some Emails do not reach their recipient and I get the bounced Email back in my box. There is little I can do about it, Brinkster's mailserver seems to block some domains (spam list?). Please, if you donnot receive a confirmation of your registration use another Email address.


Happy and blessed Eastern to all!