In the last 5 years, we’ve seen a proliferation of data far beyond expectations; these data are unstructured and distributed among multiple servers. For this new large and unstructured workload, it’s hard to come up with schema and to scale the system without impacting the performance; and yet, new generation of storage systems “Key Value Store” had to replace the decade-long relational database systems. Moving from SQL Row-oriented storage to NoSQL Column-oriented storage made it much faster to query these unstructured data sets with less overhead.
With distributed systems, there are 3 common properties we want to achieve: Consistency, Availability and Partition Tolerance. Consistency means that “Even-though there are multiple clients that are reading/writing data, all clients will see the same data at any given time”. Availability means that “For every read/write request, you get a quick response”. Partition Tolerance means that “When the network is partitioned into 2 parts that don’t talk to each other, the system continues to work”
CAP Theorem is generally described as the following: when you build a distributed system, you can only choose two of the three desirable properties: Consistency, availability, and partition Tolerance. Is this theory 100% correct? And do we always need to consider the CAP theorem as the building block for designing distributed systems?
Whether we are doing network partitioning within the data center or across data centers, we still desire the system to continue functioning normally if the internet gets disconnected, DNS not replying or TOR switch set for maintenance. So Partition Tolerance is essential for cloud computing; if we want to follow the CAP theorem, then the system has to choose between consistency and availability. But can’t we achieve both to a certain extent? Can’t we have eventual consistency with Always-Available system or Full consistency with acceptable level of availability?
If you examine Cassandra Key value store through the lens of the CAP theorem, Cassandra NoSQL chose availability over consistency. But you can also achieve consistency by designing an artistic replication strategy for multiple data center deployments (number of Replicas per Data center for each key). In Cassandra, a client sends a read/write request to the coordinator node in the Cassandra cluster, the coordinator uses partitioning to send query to all replica nodes. If any replica is down, the coordinator writes to all other replicas and keeps the write locally until the down replica comes up again. If all replicas are down, the coordinator buffers writes for few hours. So Given a key, suppose all writes for that key stopped then all the replicas will converge eventually to the latest write. Moreover, there are levels of consistency in Cassandra; normally we use Quorum which provides acceptable level of consistency with fast query response time but you can also use “ALL” consistency level which ensures strong consistency but of course with slower response time; So here is another trade off we need to make between consistency and latency.
HBase on the other side chose consistency over availability. HBase is a distributed database built on top of HDFS, consists of several tables, each table consists of a set of column families. In HBase, datasets are divided into chunks (HFiles) and a collection of HFiles form the HRegion, the HBase Master node assigns HFiles to HRegion Servers which is the daemon program that runs on each node in the cluster. Zookeeper synchronizes between tasks and guarantee consistency. The In-Memory representation is the magic of HBase; so for a write operation for instance, we write the operation in Append-only log in HDFS and then go and change the record in Memstore. The Append only log guarantees storage consistency; if the node fails we can replay this log.
To conclude, the CAP theorem is not 100% accurate and we are able to reach acceptable level of both consistency and availability by proper distribution of workload across the NoSQL cluster.