this is a first in a series of posts on Oracle’s Exadata Hybrid Columnar Compression (HCC), which is actually a great feature of Oracle database. It is currently locked to Oracle-only storage (Exadata, ZFS appliance etc) and Oracle marketing pushes it hard as it provides “10x” compression to Oracle customers.
Oracle have bold claims regarding HCC all over. For example in this whitepaper from November 2012, the first paragraph claims “average storage savings can range from 10x to 15x” and the second paragraph illustrates it with 100TB DB going down to 10TB, with 90TB of storage savings. After that, the paper switch to a real technical discussion on HCC.
So, what does HCC “10x” compression looks like in real life? How much storage savings will Oracle customers see if they move to Exadata and start using HCC?
It is very hard to find some unbiased analysis. So, to find out and start an hype-free discussion, I decided to get some real world data from Oracle customers. Here are my findings.
To start, I needed access to an undisputed data source. Luckily, one can be found on Oracle’s web site – an impressive 76-page long Exadata customer reference booklet from September 2012 containing a sample of 33 customer stories. Obviously, it is not very “representative” – reference customers tend to be more successful than the average ones – but I think there is still a lot value in analyzing it. Hey, maybe we’ll find that their storage saving is even larger than 10x-15x, who knows!
So, once I had data, I needed some methodology. I decided to pick an easy, fair methodology – go over the customer references and pick all of them that have a database-wide compression ratio (as a number or as before/after stats). This just shows the bottom line – what storage savings have customers really achieved, after all the fine-prints, trade-offs and technical considerations.Now, the savings are likely not 100% due to HCC – for example, as Exadata has a better scan rate, some indexes and materialized views might be removed, in other cases some unused tables might be identified and dropped during the migration process etc. So, while I’ll attribute all the savings to HCC for convenience, keep in mind that any number we’ll see is very likely inflated and real results are somewhat less.
Before the actual results, here are three points regarding the analysis:
- I didn’t include customers with anecdotal storage saving numbers – only those with end-to-end ratio or exact storage savings. Specifically, this ruled out two customers, Robi Axiata Limited which provided table compression estimates (2x without performance impact, 7x-10x for history tables) and SK Telecom (10x compression from raw data, but no indication of total database compression ratio or savings).
- A lot of data warehouse references (11) do not mention any sizing or compression data at all! This includes Allegro Group, GfK Group Retail and Technology, Hong Kong Housing Authority, Immonet GmbH, LinkShare Corporation, Procter & Gamble, Robi Axiata Limited, SK Telecom, Targetbase, Unicoop Firenze and Yamazaki Baking Co., Ltd.
This is quite puzzling – I’ll leave the speculations to the readers as mine are not flattering and will sound like FUD to the Exadata “believers”.
- Some references only mention the DW size before Exadata. Those include Digicel Haiti (was 38TB, mention “reduced storage requirements”), Hotwire, Inc.(was 10TB), IDS GmbH – Analysis and Reporting Services (was 12TB).
Again, same mystery if we believe in the “10x storage saving” theme.
So, only 10 out of the 33 reference customers (and out of 24 DW references) provided database size statistics (all of them are DWs). Here are the results, in alphabetical order:
Well, there you have it. I found an average saving of at most 3.4x for the best Oracle DW references. Again, these results are inflated, both due to having only the most successful reference customers and since other storage savings (dropping some of the indexes, materialized views and tables) gets mixed.
If anyone expected that HCC will deliver, for Oracle’s best references, an average compression of 10x-15x (or better), these results are probably quite shocking. For me, these actually seems better than expected, probably due to the some of the artifacts I mentioned (and some skew due to KyivStar one, which personally looks a bit fishy – they add 60GB of data per day and were at 444TB? That’s 20 years of data if the CDR rate was flat over the last 20 years…)
That’s enough for a single post. Corrections to my analysis are of course welcomed.
In some future posts, I would like to investigate why should an Oracle DW customer expect only up to 3x “storage saving” from Exadata (update: here is the link), and later ask if the savings really represents any savings at all.