Twirl: like Live Pivot but better

We re-invented Live Pivot and took away the limitations along the way.

A recurring issue in the projects we do is centered around the scalability of diagramming solutions. Typically I get questions like “How many nodes can one display in this or this framework?” and  “Will the graph layout be still fluid and interactive if we add a thousand items?” These questions are legitimate but not always straightforward to answer. There are several aspects in this context:

  • The usability aspect: how meaningful is a diagram with thousands of nodes, does the user still get insights with potentially a million links between so many nodes? Here are I have to think about how useless the Visual Studio 2010 architecture visualizer is if you try to display a big assembly. See the picture below which displays a small portion of the G2 architecture; 20.000 nodes and 90.000 links…useful? I don’t think so.
  • The .Net framework aspect: out of the box the .Net classes related to WPF and Silverlight (the UIElement and FrameworkElement, that is) are ‘heavy’. These base classes contain stuff related to templating, databinding, triggers, animation and whatnot. It means that even if you don’t need any of this you have to carry around in the memory of your application a large amount of material. If you scale a diagramming solution to thousands of item it shows. If one really wants to display very large amount of data in a diagram there is no other way but to find alternatives to the retained graphics pipeline of .Net and go back to the good old low-level mechanisms.
  • G2 architecture diagramThe granularity aspect: much like the usability aspect, is it meaningful to have a lot of animations and gimmicks when displaying large datasets?  I suppose if you wish to display large diagrams you are not interested in details and interactivity, or do you? Well, some customer do and want to have both a high-level overview and the ability to zoom down to a finer level of details. This is difficult to achieve with the current technology and average hardware. Being able to display a million nodes with databound entities and remain fluid is tough, the only solution being to switch in the zooming process from one algorithm to another. Much to say about this but it’s not the topic of this post.

In the end there are three solutions to these problems:

  • Drop Silverlight/WPF as the technology in which you try to visualize something and move to C/C++ but in this way sacrificing a lot of magic  these frameworks offer. While sometimes a viable solution for the customer, usually not a solution I like to push too much for obvious reasons.
  • Compromise between scalability and functionality by emphasizing the usability aspect; decision makers want clear diagrams and yet another cluttered interface is the last thing a company wants to pay for.
  • Come up with an ingenious or new way of visualizing data. Often the type of data and the business context dictates here what ‘ingenious’ means, each type of data has its own peculiarities. So, finding a generic solution in this line is usually not easy.

While I describe all this in the context of diagramming, the remarks are valid for any data visualization solution. In the end, what matters the most to a customer is having a solution whether it’s a diagram or a different type of visualization often doesn’t matter.

Live Pivot falls into the last category and is a new data visualization approach which handles the issues mentioned above quite well even if the amount of data it handles is still moderate. No links or diagramming here but a way to traverse a multi-dimensional dataset by filtering out data through various widgets and pivot-like datagrams which are well-known from Microsoft Excel. The name ‘Live’ Pivot is hence aptly chosen as it displays pivot charts dynamically in function of the user’s criteria. I highly advice you to take a look and play with it, it’s a free Silverlight framework and I’m sure we’ll see interesting applications of it in the near future.

Now, the reason of this post is that although I was enjoying Live Pivot in the beginning I got more and more frustrated by the limitations imposed as time passed by:

  • The datapool on which the visualization is based is a variation of XML called CXML. Though Microsoft created various tools to support the creation of CXML it remains a bit an awkward format if you want to work with data coming from WCF or use the visualization with your legacy data entities. Even if possible theoretically to convert your entities in memory using the supplied assemblies, it remains a bit of show-stopper when developing for Pivot.
  • The visuals in Live Pivot are based on the SeaDragon technology which even for small amounts of entities will force you to have huge amounts of server space because for every image you need, say, a hundred copies of it in various resolutions. This is related to the Morton layout algorithm used internally to refine images as the user zooms on different levels of the dataset under consideration.
  • Updating or serving collections from a database is possible but in practice no-one is doing this because it involves undocumented wizardry (see for example the cryptic explanation on the Pivot site). I’m sure things will become easier in the future though.
  • The whole design and development process of Pivot collections is off the main development route, the framework doesn’t integrate well into a traditional N-tier application. In a way, creating a Pivot collection is more like the design of a Photoshop template for a website: you do it once and then try to not change it in fear of the consequences it has on the rest of the site.
  • Live Pivot is freely available and well developed but if you wish to customize things for a customer…well, you can’t since the source is not yours.

Summary: Live Pivot is hardly a solution for the typical customer I meet. This said, I started to develop my own version of a Pivot-ish solution which would enable one to:

  • Use WCF as a backend to serve collections and to handle large amounts of data (not limited by the client’s memory). One of our customer talks about terrabytes of records, how would one handle this in Live Pivot?
  • Use standard POCO entities as data buckets. This goes hand in hand with the WCF requirement, of course.
  • Use any XAML in the UI instead of only pictures. Let’s imagine you want buttons, tooltips, …in the display rather than just Morton-related images?
  • Adapt the partitioning and visualization in function of customer needs.
  • Have an iterative filtering of data outside the memory boundary. That is, use the whole SQL database backend as a datapool rather than just the entities in memory.
  • Make use of all the fluid animation features of Silverlight
  • Possibly combine diagramming and Pivot-ish visualization together.
  • Run away from the horrible SeaDragon mechanics involving hundreds of little images
  • The possibility to transpose the solution to WPF and have shared code WPF/Silverlight solutions

This Silverlight library is called ‘Twirl‘ and contains (as good as) all the features above. The things which still need to be worked on are the WCF integration and the dynamic querying of the backend.

While developing Twirl I discovered that a whole lot of features are really available out of the box in Silverlight (together with the SL toolkit, Expression Blend etc.) which made the development easy, in particular:

  • The new fluid animation features of SL4 and Blend
  • The ability to databind lists to any path, not just ItemsControl and ListBox
  • The filtering and sorting of data through ICollectionView and  related classes
  • The MVVM pattern with good support in Blend
  • Easing functions, animations etc.

The biggest challenge was really how to partition data in several columns? That is; how do you organize an arbitrary set of data on the basis of a certain datatype into a limited amount of partitions? Here I had to go deep into unknown territories like data mining, clustering algorithms (k-means in particular), text analysis and more. At some point I developed an F# clustering library but had difficulties to map the F# vector type into C# types…and gave the interop effort up (it was a good learning exercise though). After much research and a lot of trials and errors I just developed my own view on the problem and a home-made algorithm came out of this. Currently the partitioning is quite good but more datasets and types of data need to be used in order to refine the algorithm.

Below are some screenshots of Twirl in various situations.

Twirl offers, much like Live Pivot but beyond it as well, the following opportunities:

  • visualization of large datasets stored on SQL Server
  • exploration of local drives through the out-of-browser or COM capability of Silverlight
  • traversals of datasets which share a common set of properties
  • interactive shopping carts and filtering of items
  • cataloguing of ebooks or e-documents in general. Here we are looking into the integration with Windows indexing service and alike
  • customization of the partitioning and the display
  • integration with other visualization techniques (diagramming, charting and whatnot)

This Twirl development was inspired by a bigger customer project which also embraces Microsoft StreamInsight and correlation research of very large datasets. While the development of Twirl was mainly in function of our customers I realize it might benefit a lot from the input from others. Whether this means I need to put the code on CodePlex, I don’t know yet. Anyway, if you have interest in Twirl or want to share your thoughts, contact us.

Tagged with:
 

Leave a Reply