Sunday, June 22, 2008

Agile infrastructure - missing pieces

My last 5 or 6 Agile projects have involved non-trivial architectures.  By this I mean that they've been more than a browser, application server, and a database.  While I would urge all people with architectural responsibility to avoid complexity, sometimes it's not feasible to simplify the architecture prior to first release. 

For the record, there are several reasons why complex architectures perform poorly, and there are several reasons why the agile approach exposes these deficiencies.  I'm assuming that an agile team will be aiming to allow a single developer pair to implement or fix any story that the business needs.

I'm going to talk about how complex architectures can adversely affect the velocity of the development team, and then throw around some patterns for offsetting that break.

o) Changing environments - even with simple architectures, if there is a shared dependency (such as shared DB schema, or network service), you can assume that someone will make a change to that dependency, and it won't be when the developer pair want it to be changed.  Typically shared dependency changes affect the entire development team, not just individual developers, causing a huge loss of either immediate development velocity, or a deferred loss of velocity due to reductions in quality.

o) Waiting for knowledge - complex environments often use a mix of technologies that take time for developer competency.  Such lead times reduce velocity.   In addition, having "experts" means that either the expert is put under huge pressure to deal with issues that exceed their capacity, or alternatively, the expert is under-utilized.

o) Investigation - when something does break in a complex architecture, it is often not immediately apparent why.  Typically there are multiple log files, multiple system accounts, multiple configurations (sometimes data driven), and multiple network servers all collaborating together.  To efficiently determine the cause of a failure can reduce velocity.

Suggested Patterns:

o) Sandbox environment - This means given each developer pair a share-nothing environment in which to develop software.  It is then the responsibility of the pair to manage their own environment, and to promote standards for this environment. Self-management means that the developer pair may make breaking changes without affecting others, and can also rule out outside interference if their own environment does break.  Providing a predictable self-managed environment forces experts to share knowledge, and to develop tooling that empowers the developer pair.  Conversely, developers will create tooling that facilitates common tasks, and share these with the rest of the team.  Note this shared-nothing environment is not necessarily restricted to a single machine, since it is desirable to be able to develop on a production-similar stack of technologies.  

o) Domain model for environment - This means building software and tooling that represents the development environment.  Using a domain model encourages both a consistent language when referring to architectural pieces, and also allows automated reasoning about a given environment.  By allowing all architectural tooling to understand a common domain model, it becomes possible to automate the setup of monitoring tools, diagrams, profiling.  Avoid IDE and product-specific tools to manage the domain model (although they may be used as appropriate by teams), and focus on a standard of deployment and configuration extrapolated from the environmental domain model.  For example, use programmatic configuration of Spring contexts that is driven from the domain model, rather than using property-file based configuration.
o) Branching by abstraction - Agile development teams often wish to change software and hardware architecture in response to issues that have been found.  They recognize that while hacks are appropriate in production support branches, such hacks have little place in the main development branch.  Architectural transforms may range from changing a persistence mechanism to switching database vendors.  Given that one team may wish to make a significant architectural change, they should avoid "big bang" introductions.  Once time-boxed spikes have been performed (to assess feasibility), the vision for the change should be shared with the team.  Once committed to the change, work starts by incrementally transforming the architecture.  These changes are distributed across the teams in small slices (through the main branch source control), potentially with two implementations co-existing within the same application, and switched over using configuration.  This allows functional software to be delivered to production at any point in the transformation.

o) Deployment Automation - setting up a sand box environment for a given developer pair is a complex task.  As such it should be an automated task, provided from the main automated build script.  This may mean automating the use of ssh in order to clean and create new database schemas, deploy EJBs or services.  We have found that dynamic programming languages (such as ruby and python) make a great alternative to shell scripts for these tasks.

o) Automated monitoring as acceptance criteria - Identifying failures is made much easier if there a single place to find information about system availability.  Those responsible for architecture should mandate monitoring of a new service as the success criteria of that service.  It is possible to automate the creation of host and services (and groups) for open source monitoring tools such as nagios, and ruby has excellent libraries for basic network service connectivity checking.  The level of monitoring required in the acceptance criteria will depend based on the value of the service.  For instance, if a duplicate server is needed for load balancing, the monitoring criteria may ping the load balancer to ensure that it can see the new server.  On the other hand, if the new piece is an ESB, the criteria may eschew basic IP connectivity in favor of firing sample messages and verifying downstream services receive the forwarded message(s).

Sunday, April 27, 2008

Technology and process innovation

In an agile/lean software development team, discussion is invited, but coordinated.  I've noticed that people who are passionate about technology or process often feel friction if they don't get a good hearing for their idea.  Equally, the technical/process leadership have to help everyone on a team achieve consistency.  Regardless, there should be no "status quo"...

I've been playing around with a pipeline approach to technology/process, where the team can
  • achieve consensus on the current state, and next steps
  • get rid of things that aren't working
  • propose blue-sky ideas
  • get support for taking almost-working ideas through to completion
So, every iteration, we can build up a map of our technologies and practices, and then select and shine the elements that we think are important.
  • Deprecated - was useful once, but we should remove uses of this practice/tech when it is found
  • Definite - should be (and is being) used unless it is seriously mismatched to the problem at hand
  • Sound - this is a really good, proven idea/tech that isn't being used yet (or only sporadically), but we think it should be adopted
  • Tentative - this would be really good, but it might have some issues that make it unsuitable
  • Radical - as a concept it solves some problems we're having, but it may raise more issues than it solves

For instance, we might have the following map (example: for a Java based web site)
  • Deprecated - Struts, JDK 1.4
  • Definite - JDK 1.5, Struts2, Spring Dependency Injection, Maven, Hibernate, Continuous Integration, CVS
  • Sound - Test Driven development, Subversion, Freemarker
  • Tentative - BDD and acceptance criteria executed with Rspec and selenium, continuous pairing
  • Radical - Scala
  • Out - consensus is around not using this tech (e.g. .Net on a java project to name a simplistic example)
This is just a snapshot.  You can see a mix of practices and technologies here.

Each iteration (for some value of each), I would like the team to produce the following:
  • System guardians for each definite technology/practice
  • A safety-factor of 1-5 from each team member for every element on the map
  • A vote/score for each sound/tentative/radical idea on the map - this is the priority associated with adoption
  • Actions that can be taken to move prioritized items towards definite (for instance,  a 15 minute "topic of the day")
In this way, there is a collaborative understanding of what works, what doesn't, and what we are doing about it.  

Some thoughts around this:
  • I think it's OK for a initially unpopular idea to stay up on the map, because it allows the proponent to feel included, and it stays in the collective memory for times when the idea is the right one.  
  • No definite process/technology is immune from deprecation - although part of the criteria for moving from sound to definite is to address issues of how to phase out the old process/tech
  • Items can be split - there are situations where a technology provides some benefits, but only if it is used in a particular way.  So, Spring DI may be definite, but Spring MVC may be out.
  • A lot of discussion around these things will happen in the retrospective anyway, but the innovation map should be persistent and displayed on the team room wall.
  • The "system guardian" pattern is great for identifying subject matter experts, and then making them able to move on to other areas.  Essentially, each system guardian is responsible for finding 2/3 other people and paring with them until they are also system guardians.  Once you have 4/5 names as guardians, the knowledge will propagate quickly.
The DebtStream Guards

In agile delivery, there is often a pattern of reserving a pair or two (depending on team size) for technology and process maintenance - the technical/process "debt" stream.

I've used this stream effectively to
  •  Simplify the build
  • "Proof of concept" 3rd party software integration
  • replace crufty code with a big restructuring
  • etc...
This team is rarely made up of the same people.  In fact the idea is often to seed a pair who will then roll into the main streams of development, spreading their experience, and backfill them.

Monday, May 21, 2007

A few bits of language related to enterprise deployment - someone send me some references, there are probably other, better, names for this:

The differences between "green/blue" and "silver/gold" are subtle, and should probably be generalised.

Green-blue domains

Completely parallel, independent, and symmetric hardware instances at all tiers of an application, only one of which is live at any time. This allows installation/upgrade of applications and data on the non-live partition before switching over the whole partition by just using a load balancer at the front. Releases are alternated between green and blue. The difference between this and having a staging environment is that each of the blue/green environments is production ready, and there are no subtle differences between them that isn't accounted for.

This patterns allows for frequent, automated, releases at the expense of extra hardware.

Note that, in a variation, green and blue can both run the live application. In this case, you need to isolate green (or blue) before upgrading it. There will be a reduction in redundancy and capacity while this occurs.

Gold-silver data staging

Having data from an upstream process treated as "silver" until verification, and having the current production data treated as "gold". At some point it is necessary to (as close to atomically as possible) to promote Silver to gold. At the same time, the services used by the previous gold are isolated, and soon after they updated to the new gold data, and joined to production.


The process by which a service becomes removed from a pool, and is no longer accessible from the live application


Bringing an isolated server back into a live service set.

Virtual IP switching

A load-balancer supported technique where the pool of IPs associated with a virtual IP is added/removed to. This is one way of isolating/joining a service.