Apache Cassandra is open-source, distributed database system. It is NoSQL system and it saves wide columns. It serves for the processing of great volume of data throughout commodity servers. It offers high availability with the absence of failure through one point. Cassandra offers robust support of clusters throughout different data centres with asynchronous replication without the control computer. It allows the low latency of operations for all the clients.

Main features

Distributable

Every node in cluster has the same role. There are no points of failure. Data are distributed throughout cluster, so each node contains different data but there isn’t any control node, because each of the nodes can serve random request.

Support of replication and replication across data centres

Replications strategies are possible to configurate. Cassandra is designed to be distributed through the system for deployment of great amounts of nodes throughout different data centres. The key features of the Cassandra distributed architecture are that it is specifically designed for the usage of a large numbers of data centres, overflow and recovery in case of failure.

Scalability

Cassandra is designed to be possible to spread reading and writing linearly as are the new machines added. The goal is the zero downtime and interruption absence of the application run.

Failures tolerability

Data are replicated to more nodes automatically. Error nodes are possible to be replaced without application interruption.

Consistency

Cassandra belongs among AP systems, which means that the availability and tolerance partition is considered to be more important than consistency. The writing and reading offer adjustable level of consistency, wherever on a scale from “writing never fails” to the “block all replicas, to be available for reading”.

Support of MapReduce

Cassandra has integration Hadoop with the support of MapReduce. It also supports Apache Pig and Apache Hive.

Query Language

Cassandra has its own Cassandra Query Language or CQL. CQL is simple interface for access to Cassandra. It adds abstraction level and hides details of structure implementation.

Advantages of using Apache Cassandra

  • Availability. Replication means that data are available on more nodes or data centres. This function is possible to configurate. Details of this mechanism are hidden in front of the user and thanks to that it is easy to use it.
  • Simple manipulation with nodes. Adding and removing nodes with the help of Cassandra is simple. Even replacing nodes in case of catastrophic failures is relatively simple. Nodes don’t have control node therefore they don’t have model of control node and slaves for replication. It means that all nodes are equal.
  • Availability of materials. It is relatively simple to start with Cassandra, because there is available a lot of materials for it. User database grows therefore it is easier to gain tips or help.

Disadvantages of using Apache Cassandra

  • Replication. Replication means that as the true information are replicated so are the mistakes.
  • Corrections. It is a specific concept used only in case of Cassandra. Most of the users leaves the correction for the database. It is a case, in which one of the nodes fails and in defined time window is not renewed his functionality. If the node is not available, other nodes are dividing data, which should have been saved to this node. If the node is renewed in defined time window, data will move back to the intended node. In the opposite case it is needed to run the repair mode. There is no data loss in any of the cases. It is mostly about a change in data organisation.
  • Searching. It is not possible to execute unexpected queries, because it is not possible to search random column. It is needed to add indexes explicitly.