HBase is a data model that is similar to Google’s big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell. It also describes how to connect to HBase using java, and how to perform basic operations on HBase using java.
Prerequisites :
1. Java JDK (This demo uses JDK version 1.7.0_67)
Make sure the JAVA_HOME system environment variable points to the JDK. Make sure the java executable’s directory is in the PATH environment variable, i.e., %JAVA_HOME%\bin.
2. SSH configured
Make sure that machines in Hadoop cluster are able to do a password-less ssh. In case of single node setup machine should be able to ssh localhost.
$ ssh-keygen -t rsa $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys $ chmod 0600 ~/.ssh/authorized_keys
3. Before we start configure HBase, you need to have a running Hadoop, which will be the storage for hbase(Hbase store data in Hadoop Distributed File System). Please refere to Hadoop-Yarn Installation in Pseudo-distributed mode post before continuing.
Installing And Configuring Hbase
1. Download the latest stable version of HBase form http://www.interior-dsgn.com/apache/hbase/stable/ using “wget” command, and extract it using the tar “zxvf” command. See the following command.
$ wget http://www.interior-dsgn.com/apache/hbase/stable/hbase-1.1.4-bin.tar.gz
$ tar -zxvf hbase-1.1.4-bin.tar.gz
2. Go to <HBASE_HOME>/conf/hbase-env.sh
Export JAVA_HOME environment variable in hbase-env.sh file as shown below:
Export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_67
Go to <HBASE_HOME>/conf/hbase-site.xml
Inside the hbase-site.xml file, you will find the <configuration> and </configuration> tags. Within them, set the HBase directory under the property key with the name “hbase.rootdir” as shown below.
<configuration> //Here you have to set the path where you want HBase to store its files. <property> <name>hbase.rootdir</name> <value>hdfs://localhost:9000/hbase</value> </property> //Here you have to set the path where you want HBase to store its built in zookeeper files. <property> <name>hbase.zookeeper.property.dataDir</name> <value>/home/hadoop/zookeeper</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> </configuration>
3. Starting HBase
After configuration is over, browse to HBase home folder and start HBase using the following command.
$bin/start-hbase.sh
4. Checking the HBase Directory in HDFS
HBase creates its directory in HDFS. To see the created directory, browse to Hadoop bin and type the following command.
$ ./bin/hadoop fs -ls /hbase If everything goes well, it will give you the following output. Found 7 items drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/.tmp drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/WALs drwxr-xr-x - hbase users 0 2014-06-25 18:48 /hbase/corrupt drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/data -rw-r--r-- 3 hbase users 42 2014-06-25 18:41 /hbase/hbase.id -rw-r--r-- 3 hbase users 7 2014-06-25 18:41 /hbase/hbase.version drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/oldWALs
5. Run a sample example ,
Go to <HBASE_HOME> and run command,
$ bin/hbase shell
Create a table –
Use the create command to create a new table. We must specify the table name and the ColumnFamily name:
hbase(main):001:0> create 'test', 'cf'
0 row(s) in 3.3340 seconds
=> Hbase::Table - test
Populating the data –
Here, we insert three values, one at a time. The first insert is at row1, column cf:a, with a value of value1. Columns in HBase are comprised of a column family prefix, cf in this example, followed by a colon and then a column qualifier suffix, a in the case below:
hbase(main):008:0> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 1.3280 seconds
hbase(main):009:0> put 'test', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0340 seconds
hbase(main):010:0> put 'test', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0150 seconds
We can get data from HBase using scan. We can limit our scan, but for now, all data is fetched:
hbase(main):011:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1427820136323, value=value1
row2 column=cf:b, timestamp=1427820144111, value=value2
row3 column=cf:c, timestamp=1427820153067, value=value3
3 row(s) in 0.1650 seconds
To get a single row of data at a time, we can use the get command.
hbase(main):012:0> get 'test', 'row1'
COLUMN CELL
cf:a timestamp=1427820136323, value=value1
1 row(s) in 0.0650 seconds
🙂