What is Apache HBase?
Apache HBase is a column-oriented NoSQL database. This seems similar to the relational database, but this stores Data in a column-oriented approach. This is written in Java and is open source, distributed the multi-dimensional database. HBase provides BigTable-like capabilities and runs at the top of HDFS ( Hadoop Distributed File System). To need fast and random access to the data, HBase is the best choice as it provides high throughput and low latency on reading/write operations. Apache HBase consists of the keys and values, and each key points to an amount which can be an array of bits or can be strings. Thus, large data sets are stored in the HBase, and this stored data can be sharable.Zookeeper is mainly helpful in managing the large distributed environments which form a complex cluster and is difficult to manage properly. Source: Secure Apache Zookeeper with Kerberos
What is the architecture of Apache HBase?
There are mainly three components present in HBase Architecture -- HMaster
- Region Server
- Zookeeper
Hmaster
It monitors all the region servers that are present in the HBase cluster. Hmaster is a kind of Master Server in the HBase. This assigns regions to the region servers and performs all the DDL operations (creating, deleting a table, etc.). It also manages several background threads. It has also featured controlling load balancing/ failover cases etc.Region Server
Region Servers runs on HDFS Datanodes which is present in Hadoop clusters. The Default size of the regions is 256 MB. The Tables of HBase are divided horizontally into row key range into areas. HBase cluster is mainly the buildup of Regions that are consisting of Tables and are present column families. Region server operates read/write operations and is also responsible for handling, managing HBase operations.Apache Zookeeper
It plays the role of the coordinator in the HBase. It provides services like maintaining configuration information, naming, providing distributed synchronization, server failure notification, etc. Zookeeper acts as an intermediate between clients and region servers, i.e., the client communicates with region servers via Zookeeper.The Kerberos protocol uses secret-key cryptography to provide secure communications over a non-secure network. Source: Introduction to Kerberos
How to secure Apache HBase with Kerberos?
Install mapr-hbase-master and mapr-hbase-regionserver packages on the cluster. Now on the HBase nodes, we have to perform the following function- Install Krb5 package and configure Kerberos
- Now we will be setting up HBase Kerberos principal mapr/@. There will be a unique keytab and Kerberos identity for each node.
- Now generate the hbase.keytab file with HBase Kerberos Principal.
- Copy that hbase.keytab file to /opt/mapr/conf directory.
- Now change the ownership of the keytab file by using chown.
- Set 600 permissions to the keytab file by using chmod command.
- Update the hbase-site.xml file by adding the following lines to it.
<property>
<name>hbase.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hbase.security.authorization</name>
<value>true</value>
</property>
<property>
<name>hbase.rpc.engine</name>
<value>org.apache.hadoop.hbase.ipc.SecureRpcEngine</value>
</property>
<property>
<name>hbase.regionserver.kerberos.principal</name>
<value>mapr/_HOST@<KERBEROS_REALM></value>
</property>
<property>
<name>hbase.master.kerberos.principal</name>
<value>mapr/_HOST@<KERBEROS_REALM></value>
</property>
- On a MapR cluster with security characteristics equipped, substitute the ${SIMPLE_LOGIN_OPTS} value of the MAPR_HBASE_SERVER_OPTS property with ${KERBEROS_LOGIN_OPTS} and the value of the MAPR_HBASE_CLIENT_OPTS property with ${HYBRID_LOGIN_OPTS}.
- Further eliminate the Dzookeeper.sasl.client=falseThe decision from the description of MAPR_HBASE_CLIENT_OPTs
- These resources are positioned in the /opt/mapr/conf/env.sh file. On a MapR cluster with security features disabled, replace the ${SIMPLE_LOGIN_OPTS} value of the MAPR_HBASE_SERVER_OPTS and MAPR_HBASE_CLIENT_OPTS properties in the /opt/mapr/conf/env.sh file with ${KERBEROS_LOGIN_OPTS}.
Kafka is a public subscribe scalable messaging system and fault-tolerant that helps us to establish distributed applications. Source: Apache Kafka Security with Kerberos
Add the following section to HBase region server nodes in the hbase-site.xml section.
<property>
<name>hbase.regionserver.keytab.file</name>
<value>/opt/mapr/conf/hbase.keytab</value>
</property>
<property>
<name>hbase.coprocessor.region.classes</name>
<value>
org.apache.hadoop.hbase.security.token.TokenProvider,org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
The same updation we have to do in HBase master node in HBase-site.xml
<property>
<name>hbase.master.keytab.file</name>
<value>/opt/mapr/conf/hbase.keytab</value>
</property>
<property>
<name>hbase.coprocessor.master.classes</name>
<value>org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
Restart HBase Master and RegionServer Nodes.For Software Testing and Automation, get in touch with us to know more about securing Apache HBase with Kerberos. Talk to XenonStack's expert
A Distributed Approach
A distributed and scalable platform helps enterprises enable real-time read/to write access to large datasets, which further helps to improve consistency and scalability. To know more about distributed platforms, we recommend talking to our expert.