this is a second in a series of posts on Oracle’s Exadata Hybrid Columnar Compression (HCC), which is actually a great feature of Oracle database. It is currently locked to Oracle-only storage (Exadata, ZFS appliance etc) and Oracle marketing pushes it hard as it provides “10x” compression to Oracle customers.
In the previous post, I showed that Oracle’s best data warehouse reference customers gets only an average “storage saving” of at most 3.4x. In this post, I’ll investigate why they don’t get the promised “10x-15x savings” that Oracle marketing occasionally mentions. In the next post, I plan to explain why I use double quotes around storage savings – why even that number is highly inflated.
10x compared to what? I remember that in one of the recent Oracle Openworld (was it this year?), Oracle had a marketing slide claiming Exadata provides 10x compression and non-Exadata provides 0x compression… (BTW – please post a link in the comments if you can share a copy). But leaving the funny / sad ExaMath aside, do non-Exadata customers enjoy any compression?
Well, as most Oracle DBAs will know, Oracle have introduced in 9i Release 2 (around 2002) a new feature that was called Data Segment Compression, which was renamed in 10g to Table Compression, in 11g release 1 to a catchy “compress for direct_load operations” and as of 11g release 2 is called Basic Compression. This feature is included in Enterprise Edition without extra cost. It provides dictionary-based compression at the Oracle’s table data block level. It is most suited for data warehousing, as the compression kicks in only during bulk load or table reorganization – updates and small inserts leaves data uncompressed.
What is the expected (and real world) average compression ratio of tables using this feature? The consensus is around 3x compression. Yes, for data warehousing on non-Exadata in the last decade Oracle provides 3x compression with Enterprise Edition!
Have Oracle Data Warehousing customers used this features? Absolutely yes! We actually used to have a slide deck at Oracle around Oracle 10g timeframe, with a slide that listed the top Oracle features used in Oracle data warehousing according to a major user group survey. As far as I can recall, table compression was at the top of the list with nearly 90% adoption… (please share corrections / link in the comments, if you can).
So, most of the mystery is already solved. Oracle used to provide 3x compression, now it provides 10x compression, so no wonder the best references customers are seeing about 3.4x savings…
Let me illustrate that with a simple and powerful example. If you open the Oracle references booklet that I shared in my previous post, and jump to page 63, you can read Turkcell reference story, one of Exadata’s most successful references. Now, I have to say that Turkcell has a very professional DBA team which I highly respect. I personally have had some great quality time with Ferhat Sengoul who manages a big DBA team in Turkcell when we last met about half an year ago, when we both presented in a DW conference in Hungary. Anyway, these are the compression results reported in page 67: “The data compression features of Oracle Database 10g helped address the issue to an extent, shrinking data volume from 250 TB to about 100 TB. But with Oracle Exadata’s hybrid columnar compression technology, Turkcell achieved exponentially greater space savings, cutting total data size to about 30 TB, an eightfold reduction.” This fits nicely with my previous explanation on table compression. So, a DW with 100TB on-disk was compressed to 30TB on-disk (which is quite awesome) and Oracle calls this achievement an 8x compression! (I have to give a lot of credit to Ferhat for providing an accurate technical disclosure here and in many other occasions, this is not trivial).
While you could probably stop reading here, I’d like to discuss some others details that have impact. The most obvious one – when switching to Exadata, not all tables will be moved to HCC for many reasons. Some are too small, some have lots of deletes or updates (or small inserts), others have high DML concurrency etc. In some cases it’s not worth paying the CPU price (on the expensive DB cores) for the compression – for example temp tables or staging tables. In addition, HCC doesn’t help in compressing LOB columns, which sometimes have non-neglectable storage contribution (if their format is not already compressed, customers can pay for the optional Advanced Compression Option to get LOB compression). Data dictionary is not compressed with HCC, and so are temp segments etc etc.
There are other objects in a typical DW that consumes space. The two main ones are indexes and materialized views (MV). Using HCC on a table does not reduce the size of each index. While MVs could be defined compressed with HCC, it is highly unlikely to be a smart move if the MV is incrementally refreshed. So, the compression doesn’t help here.
However, there is a side effect of Exadata typically having significantly more storage bandwidth – as in modern dedicated storage vs. for example 3-7 years old shared SAN. With more bandwidth, running more queries directly on the base tables (full scans) makes sense, so in a DW context some indexes and MVs will likely be dropped in the migration, saving some storage. Some similar effects will likely be seen when moving to other dedicated, high bandwidth storage solutions, depending on how much bandwidth is configured (but at least Exadata forces customers to get some chunks of the sizing right). You may be tempted to think that Exadata will actually deliver you imaginary scan rates – please watch at least part II of Kevin Closson’s “Critical Thinking Meets Exadata“, where he show that even Oracle’s own Exadata demo clearly demonstrates that the system is almost always CPU-bound, delivering just a few GB/s scans in practice.
Well, hopefully this post explains why even Oracle’s best references gets only up to about 3x storage savings from HCC. I still have another post planned in the coming weeks discussing if the “savings” do really save storage, and at what price.