Hbase Installation in Pseudo-Distributed mode

HBase is a data model that is similar to Google’s big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell. It also describes how to connect to HBase using java, and how to perform basic operations on HBase using java.

Prerequisites :

1. Java JDK (This demo uses JDK version 1.7.0_67)

Make sure the JAVA_HOME system environment variable points to the JDK. Make sure the java executable’s directory is in the PATH environment variable, i.e., %JAVA_HOME%\bin.

2. SSH configured

Make sure that machines in Hadoop cluster are able to do a password-less ssh. In case of single node setup machine should be able to ssh localhost.

 $ ssh-keygen -t rsa 
 $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 
 $ chmod 0600 ~/.ssh/authorized_keys

3. Before we start configure HBase, you need to have a running Hadoop, which will be the storage for hbase(Hbase store data in Hadoop Distributed File System). Please refere to Hadoop-Yarn Installation in Pseudo-distributed mode post before continuing.

Installing And Configuring Hbase

1. Download the latest stable version of HBase form http://www.interior-dsgn.com/apache/hbase/stable/ using “wget” command, and extract it using the tar “zxvf” command. See the following command.

$ wget http://www.interior-dsgn.com/apache/hbase/stable/hbase-1.1.4-bin.tar.gz
$ tar -zxvf hbase-1.1.4-bin.tar.gz

 

2. Go to <HBASE_HOME>/conf/hbase-env.sh

Export JAVA_HOME environment variable in hbase-env.sh file as shown below:

Export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_67

Go to <HBASE_HOME>/conf/hbase-site.xml

 

Inside the hbase-site.xml file, you will find the <configuration> and </configuration> tags. Within them, set the HBase directory under the property key with the name “hbase.rootdir” as shown below.

<configuration>
   //Here you have to set the path where you want HBase to store its files.
   <property>
      <name>hbase.rootdir</name>
      <value>hdfs://localhost:9000/hbase</value>
   </property>
        
   //Here you have to set the path where you want HBase to store its built in zookeeper  files.
   <property>
      <name>hbase.zookeeper.property.dataDir</name>
      <value>/home/hadoop/zookeeper</value>
   </property>
   <property>
      <name>hbase.cluster.distributed</name>
      <value>true</value>
   </property>
</configuration>

3. Starting HBase

After configuration is over, browse to HBase home folder and start HBase using the following command.

$bin/start-hbase.sh

4. Checking the HBase Directory in HDFS
HBase creates its directory in HDFS. To see the created directory, browse to Hadoop bin and type the following command.

 $ ./bin/hadoop fs -ls /hbase

If everything goes well, it will give you the following output.
Found 7 items
drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/.tmp
drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/WALs
drwxr-xr-x - hbase users 0 2014-06-25 18:48 /hbase/corrupt
drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/data
-rw-r--r-- 3 hbase users 42 2014-06-25 18:41 /hbase/hbase.id
-rw-r--r-- 3 hbase users 7 2014-06-25 18:41 /hbase/hbase.version
drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/oldWALs

5. Run a sample example ,

Go to <HBASE_HOME> and run command,

$ bin/hbase shell

Create a table

Use the create command to create a new table. We must specify the table name and the ColumnFamily name:

hbase(main):001:0> create 'test', 'cf'
0 row(s) in 3.3340 seconds

=> Hbase::Table - test

Populating the data

Here, we insert three values, one at a time. The first insert is at row1, column cf:a, with a value of value1. Columns in HBase are comprised of a column family prefix, cf in this example, followed by a colon and then a column qualifier suffix, a in the case below:

hbase(main):008:0> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 1.3280 seconds

hbase(main):009:0> put 'test', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0340 seconds

hbase(main):010:0> put 'test', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0150 seconds
Scanning the table for all data at once

We can get data from HBase using scan. We can limit our scan, but for now, all data is fetched:

hbase(main):011:0> scan 'test'
ROW                               COLUMN+CELL                                                                                     
 row1                             column=cf:a, timestamp=1427820136323, value=value1                                              
 row2                             column=cf:b, timestamp=1427820144111, value=value2                                              
 row3                             column=cf:c, timestamp=1427820153067, value=value3                                              
3 row(s) in 0.1650 seconds
Get a single row of data –

To get a single row of data at a time, we can use the get command.

hbase(main):012:0> get 'test', 'row1'
COLUMN                            CELL                                                                                            
 cf:a                             timestamp=1427820136323, value=value1                                                           
1 row(s) in 0.0650 seconds

 

🙂

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s