Hadoop-Yarn Installation in Pseudo-distributed mode

Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

Prerequisites :

 1 .Java JDK (This demo uses JDK version 1.7.0_67)

Make sure the JAVA_HOME system environment variable points to the JDK. Make sure the java executable’s directory is in the PATH environment variable, i.e., %JAVA_HOME%\bin.

 2. SSH configured

Make sure that machines in Hadoop cluster are able to do a password-less ssh. In case of single node setup machine should be able to ssh localhost.

 $ ssh-keygen -t rsa 
 $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 
 $ chmod 0600 ~/.ssh/authorized_keys 

Installation Steps:

3. Download hadoop-2.6.4.tar.gz from http://hadoop.apache.org/releases.html and extract to some path in your machine. Assuming that “impadmin” is the dedicated user for Hadoop.

4. Setup environment variables
Export below mentioned environment variables .

 JAVA_HOME=/usr/lib/jvm/jdk1.7.0_67 
 HADOOP_PREFIX=/home/impadmin/hadoop-2.6.4
 PATH=$PATH:$HADOOP_PREFIX/bin:$JAVA_HOME/bin:. 
 HADOOP_COMMON_HOME=$HADOOP_ PREFIX 
 HADOOP_HDFS_HOME=$HADOOP_ PREFIX 
 YARN_HOME=$HADOOP_ PREFIX 
 HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop 
 YARN_CONF_DIR=$HADOOP_PREFIX/etc/hadoop 

For this demo we have modified “hadoop-env.sh” for exporting the variables.
You can also use ~/.bashrc , /etc/bash.bashrc or other startup script to export
these variables.

5. Create HDFS directories
Create two directories to be used by namenode and datanode.

Go to <HADOOP_PREFIX>,

 mkdir -p hdfs/namenode
 mkdir -p hdfs/datanode 
list folders ,
 ls -r hdfs
You will see - 
 namenode datanode 

6. Tweak config files
Go to etc/hadoop folder under HADOOP_PREFIX and add the following
properties under configuration tag in the files mentioned below: 

etc/hadoop/yarn-site.xml:

 <property> 
 <name>yarn.nodemanager.aux-services</name> 
 <value>mapreduce_shuffle</value> 
 </property> 
 <property> 
 <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> 
 <value>org.apache.hadoop.mapred.ShuffleHandler</value> 
 </property> 

etc/hadoop/core-site.xml:

<property> 
 <name>fs.default.name</name> 
 <value>hdfs://localhost:9000 
 </property>

etc/hadoop/hdfs-site.xml:

<property> 
 <name>dfs.replication</name> 
 <value>1</value> 
 </property> 
 <property> 
 <name>dfs.namenode.name.dir</name> 
 <value>file:<HADOOP_PREFIX>/hdfs/namenode</value> 
 </property> 
 <property> 
 <name>dfs.datanode.data.dir</name> 
 <value>file:<HADOOP_PREFIX>/hdfs/datanode</value> 
 </property>

etc/hadoop/mapred-site.xml:
If this file does not exist, create it and paste the content provided below:

<?xml version="1.0"?>
 <configuration> 
 <property> 
 <name>mapreduce.framework.name</name> 
 <value>yarn</value> 
 </property> 
 </configuration>

7. Format namenode
This is one time activity.

$ bin/hadoop namenode -format 
 or 
 $ bin/hdfs namenode -format 

Once you have your data on HDFS DONOT run this command, doing so will
result in loss of content.

8.Run hadoop daemons

Start DFS daemons:
From <HADOOP-PREFIX> execute

 $ sbin/start-dfs.sh
 $ jps

you will see following processes running at this point –

 18831 SecondaryNameNode
 18983 JPS
 18343 NameNode
 18563 DataNode

Start YARN daemons:
From HADOOP_ PREFIX execute

 $ sbin/startyarn.sh
 $ jps

you will see following processesat this point –

 18831 SecondaryNameNode
 18983 JPS
 18343 NameNode
 18563 DataNode
 19312 NodeManager
 19091 ResourceManager

Note: you can also use start-all.sh and stop-all.sh for starting/stopping the daemons.

Start Job History Server:
From HADOOP_PREFIX execute

sbin/mr-jobhistory-daemon.sh start historyserver

9. Run sample and validate
Let’s run the wordcount sample to validate the setup.
Make an input file/directory.

$ mkdir input 
 $ cat > input/file 
 This is a sample file. 
 This is a sample line. 

Add this directory to HDFS:

$bin/hdfs dfs -copyFromLocal input /input 

Run example:

$bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples- 
 3*.jar wordcount /input /output

To check the ouptput execute below command:

$ bin/hdfs dfs -cat /output/*

8. Web interface
We can browse HDFS and check health using http://localhost:50070 in the
browser.

 

Installation Completed 🙂

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s