The end of the classical MPP databases era

Over the years, enterprises realized that their many isolated systems generate a vast amount of data. What if they could put all that massive data into one centralized platform, correlate it and analyze it? Surely that would uncover a wealth of relevant, hidden business insights. Of course, those were the eighties (actually earlier), so this new platform was called the Data Warehouse. And it was good. So, over the years, it became very popular and by the end of the nineties nearly every large organization had one or dozen of them.

However, with all the goodness, there were some challenges. Data warehousing required a lot of specialized skills and tools. For example, regular databases couldn’t really support very large data warehouses, so from a certain scale, it required specialized and expensive products – like those from Teradata. As a response to the quick adoption of the data warehouse, the “classical” MPP databases arrived to the market about a decade ago, first Netezza and later the rest of the gang (Vertica, Greenplum etc). They were all relatively low cost, and built parallel processing on top of shared-nothing cluster architecture. They were the one place were you could throw “petabytes of data” and analyze it over a large, mostly-commodity, computing cluster.

So, what happened?

Simply, over the time the market gained experience and its requirements and expectations have evolved. For example, some examples of challenges not solved by existing MPP databases include:

  • The main challenge of a data warehouse is making sense of the data. The classical DW solution is to clean up and standardize the data as it is being loaded (ETL). This is required as the regular MPP database schema is rigid, and schema evolution is hard and painful (just like all relational databases). That standard method is, however, complex and very time consuming, which makes it very hard for the DW to adapt to the constant and frequent changes in its source systems.
    Nowadays, a common requirement is to support dynamic schemas / schema-on-read, so at the very least the frequent schema and data sources changes wont block the ingestion of data.
    This also supports the on-going shift of power from DBAs to the developers.
  • The classical MPP databases have relatively rigid HA and scalability – unlikely that you could add a couple of nodes every week or every month in real life or survive a “once an hour” node failure rate etc. In other words, MPP databases provided scalability but not elasticity, plus HA that is focused on surviving a single node failure.
    Today elasticity is a requirement – inspired by “failure is a common, normal event” mentality of Hadoop and of relying on lower-end commodity servers.
    The next step for this mentality will be to also replace the DR concept with a native multi-data-center active-active support as part of the core architecture, which some of the NOSQL and NewSQL players are advocating.
  • If talking about rigidity, the existing players usually had an assumption of isolation and uniformity – uniform physical nodes, dedicated high performance cluster interconnect and relatively homogenous workload.
    Nowadays with cloud deployments or deployments on a shared Hadoop cluster (maybe virtualized in the future), those assumptions needs to be revisited and products needs to take into account never-ending fluctuation of CPU, I/O and network on shared resources.
  • Finally, MPP databases are propriety, closed-source beasts. That doesn’t fit well with the collaborative open-source ecosystem led by the major web giants which are the main driver behind the existing big data boom, innovation and fast pace. This represents a huge mentality shift – just think why MapR doesn’t own the Hadoop market even though it came with a superior offering early on.

Speaking of mentality shift – one more thing to add .Most organizations that will start implementing a solution increasingly ask for a single “big data” platform – that must support complex SQL but also online/OLTP lookups, batches processing unstructured data, free-text searches, graph analysis and whatever else comes up next (future-proofing). For that, the industry is no doubt standardizing on Hadoop as the unifying infrastructure – and will converge around YARN + HDFS infrastructure. Now, no one wants start its big data journey with two big data platforms, so even if sometimes it could have made sense, it will be very hard to promote such solution at the current industry buzz level. The only exception might be SAP Hana, which built a unique offering of merging OLTP and DW into a single MPP system – skipping the ETL headache altogether (while also of course supports integration with Hadoop).

So, what will happen?

Well, looking at my crystal ball, it seems that popularity will continue to quickly shift to modern MPP databases on top of Hadoop.They will have flexible schema (which sucks, but it’s the only way to keep up with the schema changes and data sources proliferation). They will be significantly less monolithic and will leverage the Hadoop ecosystem instead – for example HDFS for storage management, YARN for resource management, HCatalog for metadata management, maybe even support for various open on-disk file formats etc. So, they will be mostly trimmed down to parallel query optimization and runtime engines.

As for the old MPP players, they would sooner or later try to adapt to this world, but for most of them it is simply too late already. The only player who have committed to such change is Greenplum (now: Pivotal) which started migrating its MPP database to HDFS (and renamed it to HAWQ). Even for them, even though their first release is now out, handling the challenges mentioned will not be easy. But it is still early in the game, and their lead (in my opinion) in query optimization over the rest of the “SQL on/in Hadoop” players may buy them enough time to evolve into a truly leading “next-gen” MPP-on-Hadoop player.


Designing for database multitenancy and Oracle 12c

I’ve been keeping a low profile over the last few months as I’ve been busy working in a bio-informatics startup, but I wanted to join the Oracle 12c release celebration by discussing database multitenancy and adding some reality check on Oracle’s new option.

So, you are building a database app and want it to be multitenant – having different users and companies trusting you and your shared infrastructure with their data. What are your design options?

Single Database, Shared Schema – I guess that’s the most common one (think of most webapps). There is a big security burden on the app here – a SQL injection vulnerability means game over. There is some positive side-effects – easy to do internal reports across all users and no duplicated objects.

If you use Oracle Enterprise Edition you could potentially use VPD to enforce row-level security (isolation) – safeguarding against standard SQL injections. However, SQL injection bugs in your app could be escalated using database-level zero-days / unpatched vulnerabilities.

Single Database, Multiple Schemas – here you can leverage standard database access control features to isolate different users. As a nice bonus, you can have an additional shared schema for your app metadata tables (so you can do regular joins of your metadata tables with customer tables).SQL injections vulnerabilities could still exploit database vulnerabilities for privilege escalation and accessing other customers data.

Single Instance, Multiple Databases – here you have multiple isolated databases sharing the same runtime (memory, background processes, logs etc). Having worked with Greenplum database, I was getting used to that, as this have been a basic feature of PostgreSQL, SQL Server and many others for a decade or two. In a sense, Oracle CDB/PDB is just very lately catching up. Anyway, the databases are isolated (so no join / access possible to another database), but since they share the runtime, SQL injections vulnerabilities could still exploit database vulnerabilities for privilege escalation.

Different Instances, Shared O/S – if you want to have a more secured isolation, the minimum is to have each customer on a separate instance of the database software with a different O/S user. To break the isolation, attacker needs to find a SQL injection vulnerability, run O/S command and exploit unpatched O/S vulnerability to get root. I’m not sure this type of isolation is being really used, as it is quite clumbersome to manage the environment this way, and it is inferior to the last option:

Different Instances, Virtualized – each customer is in its own VM (cloned from some base image), fully isolated. The VMs could grow, shrink and be moved around to manage the changing workload and different customer sizes. This is the only option in my opinion that give real database level isolation between customers (you still have to worry about shared storage and network, although those could also be virtualized, and of course middleware vulnerabilities).

So, what about 12c CDB/PDB?

Well, outside multitenancy use case, CDB/PDB combo should really help simplify Oracle patching (and is included at no extra cost for Enterprise Edition for a single PDB per CDB). That by itself is a good reason to adopt it eventually, once it proves to be stable.

However, for multitenancy use cases, it is harder to justify it. First, business-wise, this is an extra cost option on top of Enterprise Edition. It is priced 17.5K $ per license, so for a modern 16-core server (8 licenses) it costs an additional 140K $ per node! Even with hypothetical 50% or even 75% discounts, it is very hard to find economical justification for that given the alternatives (especially as it can’t guarantee real isolation). For me, this is where I lost interest… Now, there might be some theoretical savings regarding database management, but if you really design a new multitenancy operation, you will surely invest effort to fully automate everything across all layers, and you could still use the free CDB/PDB features to help you get there.

On the technical side, there are some mysteries. The biggest one is that a single CDB can only hold about 252 PDBs. So, if I have 10,000 medium-sized customers, should I install 40 PDBs? After some thinking about it, it seems to me that maybe this feature was originally designed to throw together a relatively small number of heavy databases on a shared cluster to ease resource contention in RAC environment, not as a new paradigm for database multitenancy.  But your guess is as good as mine.

So, for real multitenant isolation, I believe virtualization will make a lot more sense to most customers. It is a very lightweight and proven technology, could be used across all the layers in the stack (with minimal vendor lock-in), provides resource management, best isolation and is (mostly) free. For less isolated multitenant supporting many thousands of users, seperate schemas might do it, and if there will be millions of customers, shared schema might be the only thing that works.

The only case where I think CDB/PDB could be useful for Oracle multitenancy might be if a customer is already heavily invested in a large RAC cluster that is not virtualized, and is now running into challenges of putting many databases on this shared cluster. In other words, it solves an Exadata-specific challenge – if you are married to Exadata and can’t effectively consolidate, it might be worth the pay once you are convinced 12c is stable enough, so your Exadata environment won’t be under-utilized.

Bottom line – CDB/PDB is a great catch-up that will ease future patching, but I believe the fancier use cases around multitenancy don’t make economical and technical sense. What do you think?

Exadata HCC – storage savings revisited

This is a third post in a series regarding Exadata HCC (Hybrid Columnar Compression) and the storage savings it brings to Oracle customers. In Part 1 I showed the real-world compression ratios of Oracle’s best DW references, in Part 2 I investigated why is that so, and in this part I’ll question the whole saving accounting.

So, we saw in part 1 that most Exadata DW references don’t mention the storage savings of HCC, but those who do show an average 3.4x “storage savings”.  Now let’s see what savings, if any, this brings. It all has to do with the compromises involved in giving up modern storage capabilities and the price to pay to when fulfilling these requirements with Exadata.

Let me start with a somewhat weaker but valid point. Modern storage software allows online storage software upgrade. A mission-critical database (or any database) shouldn’t be down or at risk when upgrading storage firmware or software. In order to achieve similar results with Exadata, the storage layer has to be configured as a three-way mirror (ASM High Redundancy). This is actually Oracle’s best practice, see for example the bottom of page 5 of the Exadata MAA HA paper. This configuration uses significantly more storage than any other solution on the market. This means that while the total size of all the data files might be smaller in Exadata thanks to HCC, you still need a surprisingly large raw volume of storage to support it, or you’ll have to compromise and always use offline storage software upgrades – likely the critical quarterly patch bundle, which could take at least an hour of downtime to apply, from what I read on the blogsphere.

To make it a bit more confusing, the Exadata X3 datasheet only mentions (in page 6) the  usable data capacity with 2-way mirror (ASM normal redundancy), even though the recommended configuration is 3-way mirror. I wonder if that has anything to do with later providing less net storage?

Continue reading

Exadata HCC – where are the savings?

this is a second in a series of posts on Oracle’s Exadata Hybrid Columnar Compression (HCC), which is actually a great feature of Oracle database. It is currently locked to Oracle-only storage (Exadata, ZFS appliance etc) and Oracle marketing pushes it hard as it provides “10x” compression to Oracle customers.

In the previous post, I showed that Oracle’s best data warehouse reference customers gets only an average “storage saving” of at most 3.4x. In this post, I’ll investigate why they don’t get the promised “10x-15x savings” that Oracle marketing occasionally mentions. In the next post, I plan to explain why I use double quotes around storage savings – why even that number is highly inflated.

10x compared to what? I remember that in one of the recent Oracle Openworld (was it this year?), Oracle had a marketing slide claiming Exadata provides 10x compression and non-Exadata provides 0x compression… (BTW – please post a link in the comments if you can share a copy). But leaving the funny / sad ExaMath aside, do non-Exadata customers enjoy any compression?

Well, as most Oracle DBAs will know, Oracle have introduced in 9i Release 2 (around 2002) a new feature that was called Data Segment Compression, which was renamed in 10g to Table Compression, in 11g release 1 to a catchy “compress for direct_load operations” and as of 11g release 2 is called Basic Compression. This feature is included in Enterprise Edition without extra cost. It provides dictionary-based compression at the Oracle’s table data block level. It is most suited for data warehousing, as the compression kicks in only during bulk load or table reorganization – updates and small inserts leaves data uncompressed.

What is the expected (and real world) average compression ratio of tables using this feature? The consensus is around 3x compression. Yes, for data warehousing on non-Exadata in the last decade Oracle provides 3x compression with Enterprise Edition!

Continue reading

Exadata HCC – Real-World Storage Savings

this is a first in a series of posts on Oracle’s Exadata Hybrid Columnar Compression (HCC), which is actually a  great feature of Oracle database. It is currently locked to Oracle-only storage (Exadata, ZFS appliance etc) and Oracle marketing pushes it hard as it provides “10x” compression to Oracle customers.

Oracle have bold claims regarding HCC all over. For example in this whitepaper from November 2012,  the first paragraph claims “average storage savings can range from 10x to 15x” and the second paragraph illustrates it with 100TB DB going down to 10TB, with 90TB of storage savings. After that, the paper switch to a real technical discussion on HCC.

So, what does HCC “10x” compression looks like in real life? How much storage savings will Oracle customers see if they move to Exadata and start using HCC?
It is very hard to find some unbiased analysis. So, to find out and start an hype-free discussion, I decided to get some real world data from Oracle customers. Here are my findings.

To start, I needed access to an undisputed data source. Luckily, one can be found on Oracle’s web site – an impressive 76-page long Exadata customer reference booklet from September 2012 containing a sample of 33 customer stories. Obviously, it is not very “representative” – reference customers tend to be more successful than the average ones –  but I think there is still a lot value in analyzing it.  Hey, maybe we’ll find that their storage saving is even larger than 10x-15x, who knows!

So, once I had data,

Continue reading

Why too much concurrency always hurt database performance and how databases manage it

Following my previous two posts on concurrency, I’d like to explain why “too much” concurrency always hurts database performance (in any database), and discuss a couple of common database features that were designed to manage it, including an example from Oracle and Greenplum.

What do I mean by “too much” concurrency? Let’s play with an example. Let’s assume a small, simple database system where each SQL is processed by a single-threaded, CPU-bound process (maybe it is an in-memory database or all the relevant data is cached). Let’s further assume that each SQL takes 10 seconds to execute, and that the system can only efficiently execute four parallel SQLs. So, if we fire up four SQLs at a time every 10 seconds, we will get a throughput of 24 queries/minutes and average response time of 10 seconds. So far, life is good.

But what happens if we fire up 24 queries simultaneously once a minute? Let’s assume no interference between the SQLs and a fair scheduler that cycles between the processes many times per second. In that case, we will still get 24 queries per minute, but all queries will finish about 59-60 seconds, so the average response time will be almost 60 seconds – or six times slower with the same throughput. So, scheduling too many SQLs at once just drove response time through the roof without improving throughput!

Another way that “too much” concurrency hurts performance is

Continue reading

Classical Big Data Reading – CAP Theorem

I decided to try writing once in a while a post on some of the classical papers and topics that had major effect on our big data technologies, and there is no better place to start that than the CAP Theorem.

The CAP Theorem by Eric Brewer was a philosophical fuel behind the so-called NoSQL movement, the battle cry that for a while united them all (at least in 2010). CAP stands for Consistency, Availability, (network) Partition tolerance and the theorem claims that in a distributed system, when there is an inevitable network partition (and the cluster breaks into two or more “islands”), you can’t guarantee both availability (for updates) and consistency. However, it was sometimes dumbed down to to a “Consistency, Availablity, Partition Tolerance – pick any two” slogan to explain why an eventual consistency model for a NoSQL database is legit. The discussion usually classified relational databases as “CA” and typically NoSQL databases as “AP”. Here is one example, and another representative one as an image:

Taken from

Continue reading