Apache Cassandra is open-source, distributed database system. It is NoSQL system and it saves wide columns. It serves for the processing of great volume of data throughout commodity servers. It offers high availability with the absence of failure through one point. Cassandra offers robust support of clusters throughout different data centres with asynchronous replication without the control computer. It allows the low latency of operations for all the clients.
Every node in cluster has the same role. There are no points of failure. Data are distributed throughout cluster, so each node contains different data but there isn’t any control node, because each of the nodes can serve random request.
Replications strategies are possible to configurate. Cassandra is designed to be distributed through the system for deployment of great amounts of nodes throughout different data centres. The key features of the Cassandra distributed architecture are that it is specifically designed for the usage of a large numbers of data centres, overflow and recovery in case of failure.
Cassandra is designed to be possible to spread reading and writing linearly as are the new machines added. The goal is the zero downtime and interruption absence of the application run.
Data are replicated to more nodes automatically. Error nodes are possible to be replaced without application interruption.
Cassandra belongs among AP systems, which means that the availability and tolerance partition is considered to be more important than consistency. The writing and reading offer adjustable level of consistency, wherever on a scale from “writing never fails” to the “block all replicas, to be available for reading”.
Cassandra has integration Hadoop with the support of MapReduce. It also supports Apache Pig and Apache Hive.
Cassandra has its own Cassandra Query Language or CQL. CQL is simple interface for access to Cassandra. It adds abstraction level and hides details of structure implementation.