Saturday, June 03, 2006

Back in the picture

Here I am now, coming to the end of a phase of am agile project. The following conversation occurs:

"Dude", (for it was he), "How does this software work?".
"Well...", and then we stand in a room talking until their brains bleed, and farting until everyone has left the room.

Agile documentation is light on the ground. We leave tests and a build server, and all sorts of other goodies around to help new developers pick things up. How do we leave something that will allow non-developers the same help?

Over 12 years ago I was goofing off playing around with visualising object hierarchies in SELF using shiny 3D bouncy algorithms. That lead to all sorts of "opportunities", but these days I keep myself focussed on delivering business value. It's rare I get a chance to stretch my graphical wings. So, this looked like a good opportunity.

Joe Walnes and co had already created a funky mini-site on our build server that had a clickable image map of every domain object, and the relations between them. This used a tiny, clever, tool that took the hibernate mapping files, and generated a graphviz "dot" graph. Dot is very cool, and far more powerful than I gave it credit for. Anyway, we ended up with a "domain model". It's the best kind of documentation - accurate, informative, uncluttered, and helps communicate where improvements can be made.

"Dude, how many messages do we send?"

The application under development is simple in concept. It orchestrates messages to back-end systems, and displays the results for the user. Our test strategy was to build simulations of those back-end systems, and to create user acceptance tests that primed the simulators and validated content on the browser. This works pretty well (almost to the point that devs sometimes don't even have to look at the application). Now - the wierd thing about such a setup is that, once you've primed the simulators, you have no idea at all what messages are being fired from the application.

The volume and type of such messages is important for tuning the architecture (e.g. for choosing appropriate caching strategies), so we looked around for something that would help describe the behaviour of the system. Sequence is what we found - this does one thing very well, and it was perfect for the kind of diagrams we wanted.

I ruled out graphviz - I was trying to keep the markup readable, and dot requires some magical markup in the input file to line everything up the right way. I also looked at Umlgraph, but rejected it for the same reasons. I think that Umlgraph is probably the way to go for more complex sequence diagrams that need to handle multiple message paths or any kind of asynchrony.

Sequence works in both GUI and headless mode, so we got really fast turnaround when trying to get the right syntax, and then writing a generator from which the headless mode can churn out images.

Getting our simulators to fire off notifications of message receipt was pretty straightforward. Intercepting each web request into the diagram proved a bit of a headache, but since we were using a lightweight app container (jetty), at least we could hack in all sorts of things that we wouldn't have been able to do with something more "industrial".

After a couple of days, we had sequence diagrams for every acceptance test in our test suite.
They are simple, but they immediately started telling us things about the system that can get lost when you ramp a team up quickly. Furthermore, they will act as a historical record of the design of the system, and as a diagnosis aid.

"Dude, how much of my source code is production code?".

I love this question. In my experience its nearly always 50-50 between production and test, and 70-30 between framework and business logic. What I wanted to do was show this visually, and if its possible, show this as a movie by putting snapshots of the project together.

Treemaps seem to be designed for source code analysis. There are many tools which generate treemaps, but all of them seem to be interactive, which is last the thing you want when you're doing TDD.

The only library I could find that was half-way decent is infovis, which can work in headless mode, but it really isn't designed for serious "server" functionality. Good enough for my needs though. It will take a TM3 file, and render a treemap to any bitmapped back Graphics 2D you care to give it. What is cunning is that you can (with a bit of head scratching), map visual properties such as a size, colour, etc. to arbitrary parameters of the tm3 file.

What parameters am I interested in? Well, with a bit of processing of the subversion log file, we ended up with a TM3 file containing the number of checkins to every file on the project. I think we can take this forward to include test coverage metrics, file size, etc.

Put these two together, and you have an interesting picture of the history of our project. As it turns out, product code is about 50-50 with test code, but that this agile project only had about 20% framework code.

"Dude, what does it mean?"

This is the most common response from everyone I've shown the treemap to. Sometimes, I could hang up my pretty pictures at this point. Because it is visual, it is difficult to quantify. All you see is a set of patterns that the human brain is great at matching. What I like about this is the way that visualisations trigger off a cycle of questions each of which can be answered by data analsys. "Oooh - is that red one the one with most check-ins? Why is that?". But often it stops there - in a dead-end Excel column. If we complete the cycle - and use visual metaphors for data - then we carry on asking questions. My hope is that we keep this cycle going, and use it to understand patterns of behaviour across projects.

0 Comments:

Post a Comment

<< Home