Hortonworks brings out the heavy artillery

Just a few hours ago, Hortonworks posted several new posts on their blog regarding several stealth projects that they developed and (as can be expected from Hortonworks) are working to open source and contribute to the community.

The hottest one seems to me to be Tez, which is being donated to ASF as an incubator project. Its main aim is to accelerate the runtime of Hive, Pig and Cascading jobs. It defines a concept of a task, which comprises of a set of (Direct Acyclic Graph of) MapReduce jobs and treats it as a whole, streaming data between them without spilling (spooling) back to HDFS after each MapReduce pair.

Why is that good?

  • The actual writing and reading temporary results from HDFS (reducer output from the middle of the chain) can be very slow and resource intensive.
  • This is a blocking operation – so the whole parallel processing may stale due to a few slow tasks of one set of MapReducer (while speculative execution helps here, it doesn’t eliminate the problem).

It is very interesting that Hortonworks chose to publish this as a new infrastructure project. It seems they claim it to be a sort of next-gen, generalized MapReduce. I guess it started as a way to accelerate Hive, but it’s great that it will also help Pig (which also generates MapReduce chains) and Cascading (which helps developers easily create MapReduce chains).

This is part of Hortonworks effort to make Hive 100x fasterthe “Stinger Initiative. Continue reading

Exadata HCC – storage savings revisited

This is a third post in a series regarding Exadata HCC (Hybrid Columnar Compression) and the storage savings it brings to Oracle customers. In Part 1 I showed the real-world compression ratios of Oracle’s best DW references, in Part 2 I investigated why is that so, and in this part I’ll question the whole saving accounting.

So, we saw in part 1 that most Exadata DW references don’t mention the storage savings of HCC, but those who do show an average 3.4x “storage savings”.  Now let’s see what savings, if any, this brings. It all has to do with the compromises involved in giving up modern storage capabilities and the price to pay to when fulfilling these requirements with Exadata.

Let me start with a somewhat weaker but valid point. Modern storage software allows online storage software upgrade. A mission-critical database (or any database) shouldn’t be down or at risk when upgrading storage firmware or software. In order to achieve similar results with Exadata, the storage layer has to be configured as a three-way mirror (ASM High Redundancy). This is actually Oracle’s best practice, see for example the bottom of page 5 of the Exadata MAA HA paper. This configuration uses significantly more storage than any other solution on the market. This means that while the total size of all the data files might be smaller in Exadata thanks to HCC, you still need a surprisingly large raw volume of storage to support it, or you’ll have to compromise and always use offline storage software upgrades – likely the critical quarterly patch bundle, which could take at least an hour of downtime to apply, from what I read on the blogsphere.

To make it a bit more confusing, the Exadata X3 datasheet only mentions (in page 6) the  usable data capacity with 2-way mirror (ASM normal redundancy), even though the recommended configuration is 3-way mirror. I wonder if that has anything to do with later providing less net storage? Continue reading

Exadata HCC – where are the savings?

this is a second in a series of posts on Oracle’s Exadata Hybrid Columnar Compression (HCC), which is actually a great feature of Oracle database. It is currently locked to Oracle-only storage (Exadata, ZFS appliance etc) and Oracle marketing pushes it hard as it provides “10x” compression to Oracle customers.

In the previous post, I showed that Oracle’s best data warehouse reference customers gets only an average “storage saving” of at most 3.4x. In this post, I’ll investigate why they don’t get the promised “10x-15x savings” that Oracle marketing occasionally mentions. In the next post, I plan to explain why I use double quotes around storage savings – why even that number is highly inflated.

10x compared to what? I remember that in one of the recent Oracle Openworld (was it this year?), Oracle had a marketing slide claiming Exadata provides 10x compression and non-Exadata provides 0x compression… (BTW – please post a link in the comments if you can share a copy). But leaving the funny / sad ExaMath aside, do non-Exadata customers enjoy any compression?

Well, as most Oracle DBAs will know, Oracle have introduced in 9i Release 2 (around 2002) a new feature that was called Data Segment Compression, which was renamed in 10g to Table Compression, in 11g release 1 to a catchy “compress for direct_load operations” and as of 11g release 2 is called Basic Compression. This feature is included in Enterprise Edition without extra cost. It provides dictionary-based compression at the Oracle’s table data block level. It is most suited for data warehousing, as the compression kicks in only during bulk load or table reorganization – updates and small inserts leaves data uncompressed.

What is the expected (and real world) average compression ratio of tables using this feature? The consensus is around 3x compression. Yes, for data warehousing on non-Exadata in the last decade Oracle provides 3x compression with Enterprise Edition! Continue reading

Exadata HCC – Real-World Storage Savings

this is a first in a series of posts on Oracle’s Exadata Hybrid Columnar Compression (HCC), which is actually a  great feature of Oracle database. It is currently locked to Oracle-only storage (Exadata, ZFS appliance etc) and Oracle marketing pushes it hard as it provides “10x” compression to Oracle customers.

Oracle have bold claims regarding HCC all over. For example in this whitepaper from November 2012,  the first paragraph claims “average storage savings can range from 10x to 15x” and the second paragraph illustrates it with 100TB DB going down to 10TB, with 90TB of storage savings. After that, the paper switch to a real technical discussion on HCC.

So, what does HCC “10x” compression looks like in real life? How much storage savings will Oracle customers see if they move to Exadata and start using HCC?
It is very hard to find some unbiased analysis. So, to find out and start an hype-free discussion, I decided to get some real world data from Oracle customers. Here are my findings.

To start, I needed access to an undisputed data source. Luckily, one can be found on Oracle’s web site – an impressive 76-page long Exadata customer reference booklet from September 2012 containing a sample of 33 customer stories. Obviously, it is not very “representative” – reference customers tend to be more successful than the average ones -  but I think there is still a lot value in analyzing it.  Hey, maybe we’ll find that their storage saving is even larger than 10x-15x, who knows!

So, once I had data, Continue reading

Classical Big Data Reading – Google File System

This time I’ll discuss “The Google File System (GFS) by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung from 2003. While GFS is propriety to Google, it was the direct inspiration for the Hadoop Distributed File System (HDFS), which is the fundamental layer of the popular Hadoop ecosystem.

In a nutshell, what GFS does is it takes a cluster of commodity servers with local disks and builds a fault-tolerant distributed file system on to top of it. The main innovation was picking up a new set of assumptions and optimizing the file system around a specific use case from Google:

  • The cluster is designed for low-end, low-cost nodes with local disks. Specifically, failure of disks and servers is handled automatically and transparently as they are expected to be a common, normal happening,not some rarely tested corner case.
  • The file system does not aim to be general-purpose. It is optimized for large, sequential I/O for both read and writes (high bandwidth, not low latency).
    In addition, GFS aims to hold relatively few (millions) files, mostly large ones (multi-GBs).

The architecture of GFS will look very familiar if you know HDFS. In GFS, there is a single master server (similar to HDFS Name Node) and  one chunkserver per server (similar to HDFS Data Node). The files are broken down to large, fixed-size chunks of 64MB (similar to HDFS blocks), which are stored as local linux files and are replicated for HA (three replicas by default). The master maintains all the metadata of the files and chunks in-memory. Clients get metadata from the master, but their read/write communications go directly to the chunkservers.The master logs metadata changes persistently to a local and remote operation log (similar to HDFS EditLog), but chunk location metadata is not persisted it is gathered from the chunkservers during master startup etc etc.

GFS Architecture

Cool features – surprisingly, GFS had in 2003 some features that are yet to appear in HDFS. Continue reading

Why too much concurrency always hurt database performance and how databases manage it

Following my previous two posts on concurrency, I’d like to explain why “too much” concurrency always hurts database performance (in any database), and discuss a couple of common database features that were designed to manage it, including an example from Oracle and Greenplum.

What do I mean by “too much” concurrency? Let’s play with an example. Let’s assume a small, simple database system where each SQL is processed by a single-threaded, CPU-bound process (maybe it is an in-memory database or all the relevant data is cached). Let’s further assume that each SQL takes 10 seconds to execute, and that the system can only efficiently execute four parallel SQLs. So, if we fire up four SQLs at a time every 10 seconds, we will get a throughput of 24 queries/minutes and average response time of 10 seconds. So far, life is good.

But what happens if we fire up 24 queries simultaneously once a minute? Let’s assume no interference between the SQLs and a fair scheduler that cycles between the processes many times per second. In that case, we will still get 24 queries per minute, but all queries will finish about 59-60 seconds, so the average response time will be almost 60 seconds – or six times slower with the same throughput. So, scheduling too many SQLs at once just drove response time through the roof without improving throughput!

Another way that “too much” concurrency hurts performance is Continue reading

Classical Big Data Reading – CAP Theorem

I decided to try writing once in a while a post on some of the classical papers and topics that had major effect on our big data technologies, and there is no better place to start that than the CAP Theorem.

The CAP Theorem by Eric Brewer was a philosophical fuel behind the so-called NoSQL movement, the battle cry that for a while united them all (at least in 2010). CAP stands for Consistency, Availability, (network) Partition tolerance and the theorem claims that in a distributed system, when there is an inevitable network partition (and the cluster breaks into two or more “islands”), you can’t guarantee both availability (for updates) and consistency. However, it was sometimes dumbed down to to a “Consistency, Availablity, Partition Tolerance – pick any two” slogan to explain why an eventual consistency model for a NoSQL database is legit. The discussion usually classified relational databases as “CA” and typically NoSQL databases as “AP”. Here is one example, and another representative one as an image:

Taken from http://blog.rizzif.com/2011/08/31/intro-to-nosql/ Continue reading