HBase is open-source nonrelational distributed database that is designed based on the Bigtable by Google and written in Java. It is developed as a part of Apache Hadoop project and within Apache Software Foundation. It runs on HDFS, i. e. Hadoop Distributed File System and ensures functionalities in a Bigtable style for Hadoop.
It allows storing large amounts of data, when the small volume of information is caught in the middle of a big file of empty or not essential data, as is e. g. at operation of searching the fifty biggest items in a group of two billion of records or searching non-zero items, which are only a fraction of the content (less than 0,1 %).
HBase contains compression, operation in memory and Bloom filters functions on basis of columns same as in the original Bigtable. The tables in HBase can serve as input and output for MapReduce in Hadoop and can be accessed through Java API or REST, Avro or Thrift. HBase is oriented on the columns and storage is based on key values.
HBase is not a replacement of the classical SQL database but Apache Phoenix project offers also for HBase the SQL layer or JDBC driver, which allows integration of wide spectrum of analytic and business analytic tools. Apache Trafodion project offers SQL query mechanism with the usage of ODBC and JDBC drivers, same as distributed protection of ACID transactions throughout more expressions, tables or lines and uses HBase as their storage.
HBase uses many web solutions, which process large amount volumes of data. Facebook Messaging platform used earlier HBase but then it migrated to MyRocks. In contrast to the relational and traditional databases, HBase doesn’t support SQL scripts, commands are instead of that written in Java, similarly as in MapReduce.
Advantages of using HBase
Large data files. HBase can manage the storing of large volumes of data on the base of HDFS. It can aggregate and analyse billions of lines stored in HBase tables.
The point of collapse. There is extent of database, when the traditional relational database can start to collapse. HBase can manage these extensive databases.
Fast processing. In comparison to the usual traditional databases, the HBase reading and data processing takes less time.
Shared loading and solving of the collapse. HBase can automatically renew, because HDFS is internally distributed and automatically renews and HBase runs based on HDFS.
Scalability. Scalability is allowed modularly and linearly.
Scheme-less. There is no fixed scheme of columns concept in HBase. It only defines column families.
Simple usage of Java API for client access. To access the database is possible to use simple Java API.
Consistency. HBase is suitable for usage in applications with great emphasis on speed, because it offers consistent reading and writing.
Disadvantages of using HBase
Risk of collapse. In case of using only one HMaster, there is a possibility of the database’s collapse.
Missing support of transactions. In HBase is no support of transactions.
Absence of JOINs in databases. The database doesn’t contain JOIN’s functionality, the same problem is possible to solve through usage of MapReduce layer.
Sorting only by key. RDBMS can be indexed on random field, HBase is indexed and sorted exclusively by key.
Built-in authentication. In HBase is not built-in support of controlled accesses or built-in authentication.
It can’t replace. Regarding to HBase not supporting some major functionalities of the traditional database model, with HBase is not possible to fully replace the traditional database.
Missing support of SQL structure. There is no HBase support of SQL structure, it can’t contain tool for optimization of queries.
Unpredictable latency. There occur unpredictable latencies during HBase integration with MapReduce.
Memory problems.