Month: May 2016

Installing Hbase in fully distributed mode

Pre-requisite

1. Java JDK (This demo uses JDK version 1.7.0_67)
Make sure the JAVA_HOME system environment variable points to the JDK. Make sure the java executable’s directory is in the PATH environment variable, i.e., %JAVA_HOME%\bin.

2. Make sure you have installed Hadoop on your cluster, please refer my post to install the same Installing Hadoop in fully Distributed mode

Installing And Configuring Hbase

Assumptions –
For the purpose of clarity and ease of expression, I’ll be assuming that we are setting up a cluster of 2 nodes with IP Addresses

10.10.10.1 - Hmaster
10.10.10.2 – HRegion Server

And in my case Hmaster is also a NameNode and Region Server is DataNode.

1. Download hbase-1.1.4-bin.tar.gz from http://www.apache.org/dyn/closer.cgi/hbase/ and extract to some path in your computer. Now I am calling hbase installation root as $HBASE_INSTALL_DIR.

2. Edit the file /etc/hosts on the master machine and add the following lines.

10.10.10.1 master
10.10.10.2 slave

 Note: Run the command “ping master”. This command is run to check whether the master machine ip is being resolved to actual ip not localhost ip.

3. As we are using hadoop installed machines so we have already setup passwordless-ssh.

4. Open the file $HBASE_INSTALL_DIR/conf/hbase-env.sh and set the $JAVA_HOME.

     export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_67

5. Configure Hbase

Case I- When HBase manages the Zookeeper ensemble

Open the file $HBASE_INSTALL_DIR/conf/hbase-env.sh  set the HBASE_MANAGES_ZK to true to indicate that HBase is supposed to manage the zookeeper ensemble internally.

export HBASE_MANAGES_ZK=true

Open the file $HBASE_INSTALL_DIR/conf/hbase-site.xml and add the following properties.

<configuration> 
<property> 
<name>hbase.master</name> 
 <value><master-hostname>:60000</value> 
</property> 
<property> 
 <name>hbase.rootdir</name> 
 <value>hdfs://<master-hostname>:9000/hbase</value> 
</property> 
<property> 
 <name>hbase.cluster.distributed</name> 
<value>true</value> 
</property> 
</configuration>

Case II- When HBase manages the Zookeeper ensemble externally

Open the file $HBASE_INSTALL_DIR/conf/hbase-env.sh –

   export HBASE_MANAGES_ZK=false

For this configuration add two more properties in hbase-site.xml

<property>
  <name>hbase.zookeeper.property.clientPort</name>
  <value>2181</value>
</property> 
 <property> 
 <name>hbase.zookeeper.quorum</name> 
 <value><master-hostname></value>
</property>

 Note:-In our case, Zookeeper and hbase master both are running in same machine.

6. Edit the /conf/regionservers file on all the hbase cluster nodes. Add the hostnames of all the region server nodes. For eg.

10.10.10.2

7. Repeat same procedure for all the masters and region servers.

Start and Stop Hbase cluster

8. Starting the Hbase Cluster

 

Before starting hbase cluster start zookeeper if it externally managed .Go to <zookeeper_home>/bin

     ./zkServer.sh start

 

we have need to start the daemons only on the hbase-master machine, it will start the daemons in all regionserver machines.

Execute the following command to start the hbase cluster.

    $HBASE_INSTALL_DIR/bin/start-hbase.sh

Note:-

At this point, the following Java processes should run on hbase-master machine.

xxx@master:$jps
           14143 Jps
           14007 HquorumPeer/QuorumPeerMain(if zookeeper managed externally)
           14066 Hmaster
           9561 SecondaryNameNode
           9133 NameNode
           9783 ResourceManager

 

and the following java processes should run on hbase-regionserver machine.

           23026 HRegionServer
           23171 Jps
           9311 DataNode
           9966 NodeManager

 

9. Starting the hbase shell:-

 $HBASE_INSTALL_DIR/bin/hbase shell
 HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.4, r14c0e77956f9bb4c6edf0378474264843e4a82c3, Wed Mar 16 21:18:26 PDT 2016
 hbase(main):001:0>
 hbase(main):001:0>create 't1','f1'
  0 row(s) in 1.2910 seconds
hbase(main):002:0>

 

Note: – If table is created successfully, then everything is running fine.

 

10. Stoping the Hbase Cluster:-

Execute the following command on hbase-master machine to stop the hbase cluster.

     $HBASE_INSTALL_DIR/bin/stop-hbase.sh

 

On top of hbase we can install Apache Phoenix , which is a SQL layer on Hbase. For installation of phoenix you can refer my post Installing Phoenix – A step by step tutorial

 

Advertisement

Installing Hadoop in fully distributed mode

Pre-requisite

  1. Java JDK (This demo uses JDK version 1.7.0_67)

Make sure the JAVA_HOME system environment variable points to the JDK. Make sure the java executable’s directory is in the PATH environment variable, i.e., %JAVA_HOME%\bin.

  1. SSH configured

Make sure that machines in Hadoop cluster are able to do a password-less ssh. In case of multi node setup machine should be able to passwordless ssh from/to all machines of cluster.

$ ssh-keygen -t rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
$ ssh-copy-id -i ~/.ssh/id_rsa.pub impadmin@master
$ ssh-copy-id -i ~/.ssh/id_rsa.pub impadmin@slave

Installing And Configuring Hadoop

Assumptions –

For the purpose of clarity and ease of expression, I’ll be assuming that we are setting up a cluster of 2 nodes with IP Addresses

 10.10.10.1 – Namenode
 10.10.10.2 – Datanode
  1. Download hadoop-2.6.4 and extract the installation tar on all the nodes on the same path.Dedicated user for hadoop (We assume dedicated user is “impadmin”)

Make sure that master and all the slaves have the same user.

  1. Setup environment variables

Export environment variables as mentioned below for all nodes in the cluster.

export  JAVA_HOME=/usr/lib/jvm/jdk1.7.0_67
export  HADOOP_PREFIX=/home/impadmin/hadoop-2.6.4
export  PATH=$HADOOP_PREFIX/bin:$JAVA_HOME/bin:$PATH
export  HADOOP_COMMON_HOME=$HADOOP_ PREFIX
export  HADOOP_HDFS_HOME=$HADOOP_ PREFIX
export  YARN_HOME=$HADOOP_ PREFIX
export  HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export  YARN_CONF_DIR=$HADOOP_PREFIX/etc/hadoop

For this demo we have modified “hadoop-env.sh” for exporting the variables.You can also use ~/.bashrc , /etc/bash.bashrc or other startup script to export these variables.

Add following lines at start of script in etc/hadoop/yarn-env.sh :

export  JAVA_HOME=/usr/lib/jvm/jdk1.7.0_67
export  HADOOP_PREFIX=/home/impadmin/hadoop-2.6.4
export  PATH=$PATH:$HADOOP_PREFIX/bin:$JAVA_HOME/bin:.
export  HADOOP_COMMON_HOME=$HADOOP_ PREFIX
export  HADOOP_HDFS_HOME=$HADOOP_ PREFIX
export  YARN_HOME=$HADOOP_ PREFIX
export  HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export  YARN_CONF_DIR=$HADOOP_PREFIX/etc/hadoop

3. Create a folder for hadoop.tmp.dir
Create a temp folder in HADOOP_PREFIX

mkdir -p $HADOOP_PREFIX/tmp

4. Tweak config files
For all the machines in cluster, go to etc/hadoop folder under HADOOP_ PREFIX and add the following properties under configuration tag in the files mentioned below

etc/hadoop/core-site.xml –

<property>
<name>fs.default.name</name>
<value>hdfs://Master-Hostname:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/impadmin/hadoop-2.6.4/tmp</value>
</property>

etc/hadoop/hdfs-site.xml :

<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>

etc/hadoop/mapred-site.xml :

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

etc/hadoop/yarn-site.xml :

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>Master-Hostname:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>Master-Hostname:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>Master-Hostname:8040</value>
</property>

Note: Make sure to replace “Master-Hostname” with your cluster’s master host name.

  1. Add slaves

Update HADOOP-PREFIX/etc/hadoop/slaves to add the slave entries on master machine.

Open “slaves” and enter hostname of all the slaves separated by line feed.

  1. Format namenode

This is one time activity. On master execute following command from HADOOP-PREFIX .

         $ bin/hadoop namenode -format or
         $ bin/hdfs namenode -format

Once you have your data on HDFS DONOT run this command, doing so will result in loss of content.

  1. Run hadoop daemons

From master execute below commands,Start DFS daemons:

From HADOOP-HOME execute

$sbin/start-dfs.sh
$jps
Processes which should run after starting master
NameNode
SecondaryNameNode
JPS

 Check on slave whether DFS daemons started or not :

$jps

Processes running on slaves is -
DataNode
JPS

Start YARN daemons:
From HADOOP_HOME execute

$sbin/startyarn.sh
$jps

Processes running on master - 
NameNode
SecondaryNode
ResourceManager
JPS

Check on slave whether DFS daemons started or not :

$jps

Processes running on slaves is -
DataNode
JPS
NodeManager
  1. Run sample and validate

Let’s run the wordcount sample to validate the setup. Make an input file/directory.

$ mkdir input
$ cat > input/file
This is a sample file.
This is a sample line.

    Add this directory to HDFS:

    $bin/hdfs dfs -copyFromLocal input /input2

 

Run example:

$bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-
3*.jar wordcount /input /output

To check the ouptput execute below command:

   $ bin/hdfs dfs -cat /output/*

 

  1. Web interface

We can browse HDFS and check health using http://masterHostname:50070 in the browser.Also we can check the status of the applications running using the following

URL: http://masterHostname:9000

Done !!