Exadata HCC – Real-World Storage Savings

this is a first in a series of posts on Oracle’s Exadata Hybrid Columnar Compression (HCC), which is actually a  great feature of Oracle database. It is currently locked to Oracle-only storage (Exadata, ZFS appliance etc) and Oracle marketing pushes it hard as it provides “10x” compression to Oracle customers.

Oracle have bold claims regarding HCC all over. For example in this whitepaper from November 2012,  the first paragraph claims “average storage savings can range from 10x to 15x” and the second paragraph illustrates it with 100TB DB going down to 10TB, with 90TB of storage savings. After that, the paper switch to a real technical discussion on HCC.

So, what does HCC “10x” compression looks like in real life? How much storage savings will Oracle customers see if they move to Exadata and start using HCC?
It is very hard to find some unbiased analysis. So, to find out and start an hype-free discussion, I decided to get some real world data from Oracle customers. Here are my findings.

To start, I needed access to an undisputed data source. Luckily, one can be found on Oracle’s web site – an impressive 76-page long Exadata customer reference booklet from September 2012 containing a sample of 33 customer stories. Obviously, it is not very “representative” – reference customers tend to be more successful than the average ones –  but I think there is still a lot value in analyzing it.  Hey, maybe we’ll find that their storage saving is even larger than 10x-15x, who knows!

So, once I had data, I needed some methodology. I decided to pick an easy, fair methodology – go over the customer references and pick all of them that have a database-wide compression ratio (as a number or as before/after stats). This just shows the bottom line – what storage savings have customers really achieved, after all the fine-prints, trade-offs and technical considerations.Now, the savings are likely not 100% due to HCC – for example, as Exadata has a better scan rate, some indexes and materialized views might be removed, in other cases some unused tables might be identified and dropped during the migration process etc. So, while I’ll attribute all the savings to HCC for convenience, keep in mind that any number we’ll see is very likely inflated and real results are somewhat less.

Before the actual results, here are three points regarding the analysis:

  1. I didn’t include customers with anecdotal storage saving numbers – only those with end-to-end ratio or exact storage savings. Specifically,  this ruled out two customers, Robi Axiata Limited which provided table compression estimates (2x without performance impact, 7x-10x for history tables) and SK Telecom (10x compression from raw data, but no indication of total database compression ratio or savings).
  2. A lot of data warehouse references (11) do not mention any sizing or compression data at all! This includes Allegro Group, GfK Group Retail and Technology, Hong Kong Housing Authority, Immonet GmbH, LinkShare Corporation, Procter & Gamble, Robi Axiata Limited, SK Telecom, Targetbase, Unicoop Firenze and Yamazaki Baking Co., Ltd.
    This is quite puzzling – I’ll leave the speculations to the readers as mine are not flattering and will sound like FUD to the Exadata “believers”.
  3. Some references only mention the DW size before Exadata. Those include Digicel Haiti (was 38TB, mention “reduced storage requirements”), Hotwire, Inc.(was 10TB), IDS GmbH – Analysis and Reporting Services (was 12TB).
    Again, same mystery if we believe in the “10x storage saving” theme.

So, only 10 out of the 33 reference customers (and out of 24 DW references) provided database size statistics (all of them are DWs). Here are the results, in alphabetical order:

Exadata_HCC_top_refsWell, there you have it. I found an average saving of at most 3.4x for the best Oracle DW references. Again, these results are inflated, both due to having only the most successful reference customers and since other storage savings (dropping some of the indexes, materialized views and tables) gets mixed.

If anyone expected that HCC will deliver, for Oracle’s best references, an average compression of 10x-15x (or better), these results are probably quite shocking. For me, these actually seems better than expected, probably due to the some of the artifacts I mentioned (and some skew due to KyivStar one, which personally looks a bit fishy – they add 60GB of data per day and were at 444TB? That’s 20 years of data if the CDR rate was flat over the last 20 years…)

That’s enough for a single post. Corrections to my analysis are of course welcomed.
In some future posts, I would like to investigate why should an Oracle DW customer expect only up to 3x “storage saving” from Exadata (update: here is the link), and later ask if the savings really represents any savings at all.

Advertisements

8 thoughts on “Exadata HCC – Real-World Storage Savings

  1. Pingback: Exadata HCC – where are the savings? | Big Data, Small Font

  2. Just curious, wouldn’t many of these customers be coming from “BASIC” compression, typical of a large non-Exadata Oracle DW? If you assume BASIC compression was 3x versus an uncompressed table, then wouldn’t the overall EHCC compression from uncompressed to EHCC be 3*3.4 = 10x?

    I suspect your 3.4x is the delta effect of EHCC over BASIC compression. The marketing folks are clearly choosing to compare EHCC to uncompressed tables (which is good valid marketing).

  3. Hi Dave,
    Thanks for your comment,
    Your analysis is spot on – I published earlier today a follow-up post regarding the “whys” – I’ll update this post with a link to it https://ofirm.wordpress.com/2013/02/06/exadata-hcc-where-are-the-savings/
    My point is that Oracle can proudly say HCC provides 10x compression vs. raw data, but to claim Exadata provides 10x storage savings for existing Oracle DW customers is highly misleading. See my follow up post regarding Turkcell additional 3.4x compression and how Oracle calls it “8x storage savings”

  4. Pingback: New Installation Cookbook « flashdba

  5. Pingback: Exadata Hybrid Columnar Compression (HCC) for (storage) dummies « Dirty Cache

  6. Pingback: Exadata HCC – storage savings revisited | Big Data, Small Font

  7. I have made large tests (over 1200!) with different data sources and different database models on three different Exadata Database machines (V2 and X2-2).
    My personal findings are that HCC achieves compression rates between 10 and 27 compared to non-compressed data for the total DWH with a load of one year of data and with HCC mode Compress for Query High.
    The best HCC real life compression factor on a fact table I have achieved so far with this mode was 52!

  8. Pingback: Oracle In-Memory Option: the good, the bad, the ugly | Big Data, Small Font

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s