Hadoop-Yarn Installation in Pseudo-distributed mode

Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

Prerequisites :

 1 .Java JDK (This demo uses JDK version 1.7.0_67)

Make sure the JAVA_HOME system environment variable points to the JDK. Make sure the java executable’s directory is in the PATH environment variable, i.e., %JAVA_HOME%\bin.

 2. SSH configured

Make sure that machines in Hadoop cluster are able to do a password-less ssh. In case of single node setup machine should be able to ssh localhost.

 $ ssh-keygen -t rsa 
 $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 
 $ chmod 0600 ~/.ssh/authorized_keys 

Installation Steps:

3. Download hadoop-2.6.4.tar.gz from http://hadoop.apache.org/releases.html and extract to some path in your machine. Assuming that “impadmin” is the dedicated user for Hadoop.

4. Setup environment variables
Export below mentioned environment variables .


For this demo we have modified “hadoop-env.sh” for exporting the variables.
You can also use ~/.bashrc , /etc/bash.bashrc or other startup script to export
these variables.

5. Create HDFS directories
Create two directories to be used by namenode and datanode.


 mkdir -p hdfs/namenode
 mkdir -p hdfs/datanode 
list folders ,
 ls -r hdfs
You will see - 
 namenode datanode 

6. Tweak config files
Go to etc/hadoop folder under HADOOP_PREFIX and add the following
properties under configuration tag in the files mentioned below: 







If this file does not exist, create it and paste the content provided below:

<?xml version="1.0"?>

7. Format namenode
This is one time activity.

$ bin/hadoop namenode -format 
 $ bin/hdfs namenode -format 

Once you have your data on HDFS DONOT run this command, doing so will
result in loss of content.

8.Run hadoop daemons

Start DFS daemons:
From <HADOOP-PREFIX> execute

 $ sbin/start-dfs.sh
 $ jps

you will see following processes running at this point –

 18831 SecondaryNameNode
 18983 JPS
 18343 NameNode
 18563 DataNode

Start YARN daemons:
From HADOOP_ PREFIX execute

 $ sbin/startyarn.sh
 $ jps

you will see following processesat this point –

 18831 SecondaryNameNode
 18983 JPS
 18343 NameNode
 18563 DataNode
 19312 NodeManager
 19091 ResourceManager

Note: you can also use start-all.sh and stop-all.sh for starting/stopping the daemons.

Start Job History Server:
From HADOOP_PREFIX execute

sbin/mr-jobhistory-daemon.sh start historyserver

9. Run sample and validate
Let’s run the wordcount sample to validate the setup.
Make an input file/directory.

$ mkdir input 
 $ cat > input/file 
 This is a sample file. 
 This is a sample line. 

Add this directory to HDFS:

$bin/hdfs dfs -copyFromLocal input /input 

Run example:

$bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples- 
 3*.jar wordcount /input /output

To check the ouptput execute below command:

$ bin/hdfs dfs -cat /output/*

8. Web interface
We can browse HDFS and check health using http://localhost:50070 in the


Installation Completed 🙂


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s