Wednesday, 4 January 2012

Relativistic Databases

Finally I have a good title for something that I have been thinking about for a long time.

Once there was a brilliant man named Isaac Newton who wrote the physical laws of kinematics, that is how things move in space and time.  Across the universe the clocks that measured how things moved were synchonised, the all showed the same time and they all moved at the same rate.

In the early 20th century that view of the universe was revised with the concept that only the clocks that could see each other could be synchronised, and to stay synchonised they had to tick at different rates and a new relativistic view of the world emerged.

In the same way software engineers have designed software around synchonised clocks in the form of ACID transactions, with locking mechanisms to ensure that data was updated in the "correct" order.  There is only one order, synchonised across the computing universe.  As in the early 20th century physics as the previous theory reached the boundaries of the marvellous achievements of Newtonian physics, and a new theory or paradigm was needed, so to, today the current generation of database designers are having to move on.

As the old notions of global clocks in physics was lost, so it must be in computing as in reality information moves at the speeds of network traffic in much the same way as light moves across the universe.  Things that happen today on the far side of the universe will not be visible here until some future point in history, in the same way data loaded into a computer system on the other side of the planet will take time to get here.

Why is this important? The ACID transaction model requires tight synchronisation that limits scalability and redundancy.  In short it forces computerised databases onto central servers and thereby introduces critical failure points.  The relativistic database is in contrast distributed and fault tolerant.  Why should a shop in Wellington be unable to trade because a server in Christchurch has been hit by an earthquake?

So what will this new computing world look like?  The old notion that programs must process data in order must go.  Data needs to be what I call "temporally commutative".  Let us consider two events happening at the same time on opposite sides of the globe, and let us call these event A and event B.  These events are documented by documents A and B respectively, and these documents are represented as data in a computer network.  In order for there to be consistency the final state S must be the same irrespective of the order at which the data for events A and B arrive.  In fact the final state of the computer system must be uniquely determined by the set of documents it contains irrespective of the order of their arrival.

Achieving this end is about reinterpreting document semantics.  At the place and time where the document was issued it was considered correct and proper to do so.  That it might be considered in this time and place improper is not material as we may are not and can not be if full possession of all the facts, and new facts can and do become available all the time. If data representing a withdrawal of cash over a credit limit arrives, they have got their money so do not dismiss it as invalid.  Rather there may still be a deposit on the way, or an increased credit limit that has not arrived yet.

The property of temporal commutativity allows for any collection of documents only a single final state.  It also considerably reduces the complexity of specifying and testing systems in that it reduces the test cases to n squared for a given document set of n documents to assure temporal commutativity.  If commutativity is not assumed then there are many more cases to consider.  To illustrate consider a set of three numbers a,b and c.  If they are all joined with a + then there is only one answer, a+b+c=a+c+b=b+a+c=b+c+a=c+a+b=c+b+a.   If we were to use a minus operator then some of those mathematical expressions would have had a different answer.

To illustrate let a=1, b=2, c=5.  a-b-c=-6 = a-c-b,  b-a-c=-4=b-c-a and c-a-b = 2 = c-b-a.  Because in general the - operator is non-commutative by definition the order of the terms is significant.

Finally, there are cases where ACID transactions are best, such as selling seats in a stadium for a match.  There is one authoritative perspective which means that seats are not oversold, and as with everything it is all about using the right tool for the job.

No comments:

Post a Comment

Your comments are welcome.