The end of the monolithic database engine dream

It seems I like calling my posts “The End of…” 🙂

Anyway, Rob Klopp wrote an interesting post titled “Specialized Databases vs. Swiss Army Knives“. In it, he argues with Stonebraker’s claim that the database market will split to three-to-six categories of databases, each with its own players. Rob counter claims by saying that data is typically used in several ways, so it is cumbersome to have several specialized databases instead of one decent one (like Hana of course…).

I have a somewhat different perspective. In the past, let’s say ten years ago, I was sure that Oracle  database is the right thing to throw at any database challenge, and I think many in the industry shared that feeling (each with his/her favorite database, of course). There was a belief that a single database engine could be smart enough, flexible enough, powerful enough to handle almost everything.

That belief is now history. As I will show, it is now well understood and acknowledged by all the major vendors that a general-purpose database engine just can’t compete in all high-end niches. HOWEVER, the existing vendors are, as always, adapting. All of them are extending their databases to offer multiple database engines inside their product, each for a different use case.

The leader here seems to be Microsoft SQL Server. SQL Server 2014 (currently at CTP2) comes with three separate database engines. In addition to the existing engine, they introduced Hekaton – an in-memory OLTP engine that looks very promising. They also delivered a brand new implementation of their columnar format – now called clustered columnstore index – which is now fully updatable and is actually not an index – it is a primary table storage format with all the usual plumbing (delta trees with a tuple mover process when enough rows have accumulated).

Oracle has been promoting two engines for a long while – TimesTen for low-latency, intensive OLTP (in-memory of course) ,and the Oracle database. Within the Oracle database, they are planning to introduce a second engine for data warehousing in the next release (with an in-memory columnar format) – which I discussed in the past. And IBM DB2 have already introduced an in-memory columnar engine called BLU Acceleration six months ago.

Now, why do I call them engines? It seems to me all these are major, intrusive features, with whole new on-disk formats, memory representations, processing logic and code paths, well beyond a new index type or similar change. In addition, some come with different or no locking/latching and other dramatic internal changes.

So, to wrap it up, while I believe that specialized engines is a new must, the existing players are working to implement multiple engines within their products to keep them relevant. As for having a single decent database for everything – if it does support the required high performance and scalability, with low cost and rich enough functionality (HA, security, resource management, great optimizer etc)… There is always place for a single database for different workloads, if and when it delivers.

Leave a comment