Monday, July 26, 2004

Summer Migration

Clustering is remarkably, almost trivially, easy. There is a current Linux distro called ClusterKnoppix that combines Knoppix and OpenMosix.  Knoppix is what is known as single system image distribution:  a whole operating system that runs from CD, and uses a RamDisk as a read/write file system.  OpenMosix is a load balancing extension to the Linux kernel and provides a transparent mechanism for migrating processes across heterogeneous nodes.

So, in theory, you can plug some old PCs together, boot them off a CD, and they will automatically detect each other, and start picking up work passed to them from other nodes.
The funny thing is that this actually works in practice, too.   In fact, you only need 1 CD, because all other nodes can be configured to boot over PXE. 

Setting aside the low tech spikes (4 nodes running on 1 WinXP box via VMWare - yugh) it is incredibly simple to make all this work.  But: there is a gotcha...

Process migration

I used to think of threads as different to processes.  Threads generally belong to a process, but share a common state with all threads in the same process.  A file opened in one thread may be read from another intra-process thread. The same is true for access to memory.  Processes, however, share very little in the way of resources; they have a protected memory address space, they their own file descriptors, and so on.

Recently, I've been educated that, under Linux, processes and threads are created in a standard way.  E.g. the only thing that differentiates a process from a thread is the decision about which resources to share, and which to protect.  Hence, when I run one multi-threaded java application under linux, I see many java processes displayed in ps -efThe threads are merely processes that share a file descriptor table and a memory space.

The problem comes when OpenMosix wants to migrate a process to another node.  It can't really tell the difference between a full-blown process, and a mere thread.  If the kernel makes the wrong decision (e.g. treating a thread as self contained process, saving memory state, and transferring all file desciptors to a new node) then it risks threads from the same process running on different nodes.  This will lead to tears before bedtime, unless a safe shared memory architecture is in place (which OpenMosix doesn't have, by the way).

This is a problem that occurs with Java applications, which are invariably multi-threaded.  To work around it you have to avoid using Linux system level threads, and instead use user-level threads.  In this case, OpenMosix just sees one process for each Java application, and it is the Java virtual machine which maintains thread state.  These user-level threads are known as green threads, and require what is known as cooperative multi-tasking.  In other words, the Java virtual machine has to be told when to context-switch - it can't just let the operating system make the switch.  This cooperation is achieved through judicious use of yield() and sleep() methods.

Note that yield() and sleep() are the responsibility of the class libraries and application code, and are not automatically inserted into bytecode.  If a green thread is greedy (e.g. performs intensive CPU calculations without ever touching the disk), it is possible that the JVM will never get the chance to context-switch. 

So, green threads have their disadvantages, but this should more than be made up for by being able to migrate a whole java application from node to node.  All we need is a Java VM which supports green threads.

The knoppix basic ISO has a fantastically large amount of useful software.  It even has Sun's latest Java Runtime (JDK 1.4.2).   Unfortunately, JDK 1.4 doesn't implement green threads.  This means that our JVM process cannot be migrated.

At this point, I'm not quite sure how to get around this problem, but there are options.  We only need to migrate when we fork a new java process (not when we need a new thread), so we can:
 
  • Somehow tell OpenMosix to migrate my java process at startup, but never migrate after that, or,
  • Make each java process a member of a pvm (parallel virtual machine) , so that in effect we get the above solution for free.   Not sure how easy it is to get OpenMosix nodes to automatically join a pvm, but we've seen examples where povray can automatically distribute each frame of an animation across a cluster.




Scalability of development process

My current project has an agile mind-set, but a slow development process. The overhead of picking up a CR, checking out, testing, checking-in, integrating, and deploying build is currently getting up towards the 3 hour mark. This is a fixed overhead associated with any development work, not including the actual engineering effort of implementing a change request.

This overhead can be divided into parts - There is firstly that overhead that is fixed for each developer - e.g. as the number of developers increases, the individual overhead remains relatively constant. The other sort of overhead is the sort that increases with the number of developers.

For instance, if the build is broken, no-one can check-in. This is block on all development. If you have 100 developers, each will be blocked for at least 1 hour plus the length of any fix to the broken build. 100 developer hours lost, just like that.

However, if the local build process breaks, only the developer who broke it will be affected. Alternatively, if each local build process takes 1 hour (i.e. 1 hour of lost developer time waiting for the local build to complete) then while each and every developer suffers that overhead, they are free to pair with other developers.

If the build is denegenerate, then effort must be placed first on foremost on reducing the impact of a broken shared build. Only if it is not cost-effective to improve the build server should work commence on reducing the overhead of the local build process.

To that end, we have been looking at making our build server extremely scalable and cost effective. We have been looking at clustering, and trying to use free software to do it.




Flashback


Last night, I tried to explain to L (an architect) what this "Chaos Theory" thang is.  She's doing a thesis on the "Tectonics of Smoke", and has been asked to actually try and understand the physics/maths behind how smoke behaves.  Not a particularly easy subject...



Still, she pulled out an old book of mine: Chaos by James Gleick.  God knows what I'd have done with out it, if only for the pictures.  In half an hour we covered complex numbers, iterative functions, rounding errors, the Lorenz attractor, and the Mandelbrot set, Mitchell Feigenbaum, and weather prediction.



This isn't the first time I've explained these concepts to architects, and I find it amazing that that they hadn't exposed to these ideas before.  With few exceptions, people in this profession often seem to be polymaths - understanding aspects of social policy, engineering,  law, aesthetics, and design.   It occurred to me that architects are the idealogical children of Newton's era; the end of the 17th century being the last time that a single person could really understand everything about many subjects.






Calculus in the community

This blog will contain random thoughts on things of a generally geeky nature - pseudo science, software development, technology, and maybe a few kittens.

This morning there was a discussion on the radio of the rivalry between Newton and Leibniz over calculus.  Newton discovered it first, but kept it to himself.  Leibniz discovered it later, and had a much more effective notation.  They warred against each other (through intemediaries), and eventually Newton's notation dominated British science (and set it back 50 or so years).  It seemed that who was "first" seemed more important than who had the better solution.  

This contrasts with the situation of H.  She's currently watching the progress of a paper shes trying to publish.  The sticking point is the fact that two different researchers made the same discovery - they agree on data, but not on conclusion.  Individually, each could publish, but it would be fairly unethical to do so.  Joint publication is the desired outcome, but because they have contradictory conclusions the publication process has ground to a halt.  Thus they work together to tone down these contradictions, rather than let the scientific community perform this process.