Month: January 2018

Most frequently used Hadoop commands

Commands useful for users of a hadoop.

1. appendToFile

Usage: hdfs dfs -appendToFile <localsrc> … <dst>

Append single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and appends to destination file system.

hdfs dfs -appendToFile localfile /user/hadoop/hadoopfile
hdfs dfs -appendToFile localfile1 localfile2 /user/hadoop/hadoopfile
hdfs dfs -appendToFile localfile hdfs://nn.example.com/hadoop/hadoopfile
hdfs dfs -appendToFile – hdfs://nn.example.com/hadoop/hadoopfile Reads the input from stdin.

Exit Code:

Returns 0 on success and 1 on error.

2. cat

Usage: hdfs dfs -cat URI [URI …]

Copies source paths to stdout.

Example:

hdfs dfs -cat hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2
hdfs dfs -cat file:///file3 /user/hadoop/file4

Exit Code:

Returns 0 on success and -1 on error.

3. chmod

Usage: hdfs dfs -chmod [-R] <MODE[,MODE]… | OCTALMODE> URI [URI …]

Change the permissions of files. With -R, make the change recursively through the directory structure. The user must be the owner of the file, or else a super-user. Additional information is in the Permissions Guide.

Options

The -R option will make the change recursively through the directory structure.

4. chown

Usage: hdfs dfs -chown [-R] [OWNER][:[GROUP]] URI [URI ]

Change the owner of files. The user must be a super-user. Additional information is in the Permissions Guide.

Options

The -R option will make the change recursively through the directory structure.

5. copyFromLocal

Usage: hdfs dfs -copyFromLocal <localsrc> URI

Similar to put command, except that the source is restricted to a local file reference.

Options:

The -f option will overwrite the destination if it already exists.

6. copyToLocal

Usage: hdfs dfs -copyToLocal [-ignorecrc] [-crc] URI <localdst>

Similar to get command, except that the destination is restricted to a local file reference.

7. count

Usage: hdfs dfs -count [-q] [-h] <paths>

Count the number of directories, files and bytes under the paths that match the specified file pattern. The output columns with -count are: DIR_COUNT, FILE_COUNT, CONTENT_SIZE FILE_NAME

The output columns with -count -q are: QUOTA, REMAINING_QUATA, SPACE_QUOTA, REMAINING_SPACE_QUOTA, DIR_COUNT, FILE_COUNT, CONTENT_SIZE, FILE_NAME

The -h option shows sizes in human readable format.

Example:

hdfs dfs -count hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2
hdfs dfs -count -q hdfs://nn1.example.com/file1
hdfs dfs -count -q -h hdfs://nn1.example.com/file1

Exit Code:

Returns 0 on success and -1 on error.

8. cp

Usage: hdfs dfs -cp [-f] [-p | -p[topax]] URI [URI …] <dest>

Copy files from source to destination. This command allows multiple sources as well in which case the destination must be a directory.

‘raw.*’ namespace extended attributes are preserved if (1) the source and destination filesystems support them (HDFS only), and (2) all source and destination pathnames are in the /.reserved/raw hierarchy. Determination of whether raw.* namespace xattrs are preserved is independent of the -p (preserve) flag.

Options:

The -f option will overwrite the destination if it already exists.
The -p option will preserve file attributes [topx] (timestamps, ownership, permission, ACL, XAttr). If -p is specified with no arg, then preserves timestamps, ownership, permission. If -pa is specified, then preserves permission also because ACL is a super-set of permission. Determination of whether raw namespace extended attributes are preserved is independent of the -p flag.

Example:

hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2
hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2 /user/hadoop/dir

Exit Code:

Returns 0 on success and -1 on error.

9. du

Usage: hdfs dfs -du [-s] [-h] URI [URI …]

Displays sizes of files and directories contained in the given directory or the length of a file in case its just a file.

Options:

The -s option will result in an aggregate summary of file lengths being displayed, rather than the individual files.
The -h option will format file sizes in a “human-readable” fashion (e.g 64.0m instead of 67108864)

Example:

hdfs dfs -du /user/hadoop/dir1 /user/hadoop/file1 hdfs://nn.example.com/user/hadoop/dir1

Exit Code: Returns 0 on success and -1 on error.

10. get

Usage: hdfs dfs -get [-ignorecrc] [-crc] <src> <localdst>

Copy files to the local file system. Files that fail the CRC check may be copied with the -ignorecrc option. Files and CRCs may be copied using the -crc option.

Example:

hdfs dfs -get /user/hadoop/file localfile
hdfs dfs -get hdfs://nn.example.com/user/hadoop/file localfile

Exit Code:

Returns 0 on success and -1 on error.

11. ls

Usage: hdfs dfs -ls [-R] <args>

Options:

The -R option will return stat recursively through the directory structure.

For a file returns stat on the file with the following format:

permissions number_of_replicas userid groupid filesize modification_date modification_time filename

For a directory it returns list of its direct children as in Unix. A directory is listed as:

permissions userid groupid modification_date modification_time dirname

Example:

hdfs dfs -ls /user/hadoop/file1

Exit Code:

Returns 0 on success and -1 on error.

12. lsr

Usage: hdfs dfs -lsr <args>

Recursive version of ls.

Note: This command is deprecated. Instead use hdfs dfs -ls -R

13. mkdir

Usage: hdfs dfs -mkdir [-p] <paths>

Takes path uri’s as argument and creates directories.

Options:

The -p option behavior is much like Unix mkdir -p, creating parent directories along the path.

Example:

hdfs dfs -mkdir /user/hadoop/dir1 /user/hadoop/dir2
hdfs dfs -mkdir hdfs://nn1.example.com/user/hadoop/dir hdfs://nn2.example.com/user/hadoop/dir

Exit Code:

Returns 0 on success and -1 on error.

13. moveFromLocal

Usage: hdfs dfs -moveFromLocal <localsrc> <dst>

Similar to put command, except that the source localsrc is deleted after it’s copied.

14. moveToLocal

Usage: hdfs dfs -moveToLocal [-crc] <src> <dst>

Displays a “Not implemented yet” message.

15. mv

Usage: hdfs dfs -mv URI [URI …] <dest>

Moves files from source to destination. This command allows multiple sources as well in which case the destination needs to be a directory. Moving files across file systems is not permitted.

Example:

hdfs dfs -mv /user/hadoop/file1 /user/hadoop/file2
hdfs dfs -mv hdfs://nn.example.com/file1 hdfs://nn.example.com/file2 hdfs://nn.example.com/file3 hdfs://nn.example.com/dir1

Exit Code:

Returns 0 on success and -1 on error.

16. put

Usage: hdfs dfs -put <localsrc> … <dst>

Copy single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and writes to destination file system.

hdfs dfs -put localfile /user/hadoop/hadoopfile
hdfs dfs -put localfile1 localfile2 /user/hadoop/hadoopdir
hdfs dfs -put localfile hdfs://nn.example.com/hadoop/hadoopfile
hdfs dfs -put – hdfs://nn.example.com/hadoop/hadoopfile Reads the input from stdin.

Exit Code:

Returns 0 on success and -1 on error.

17. rm

Usage: hdfs dfs -rm [-f] [-r|-R] [-skipTrash] URI [URI …]

Delete files specified as args.

Options:

The -f option will not display a diagnostic message or modify the exit status to reflect an error if the file does not exist.
The -R option deletes the directory and any content under it recursively.
The -r option is equivalent to -R.
The -skipTrash option will bypass trash, if enabled, and delete the specified file(s) immediately. This can be useful when it is necessary to delete files from an over-quota directory.

Example:

hdfs dfs -rm hdfs://nn.example.com/file /user/hadoop/emptydir

Exit Code:

Returns 0 on success and -1 on error.

18. rmr

Usage: hdfs dfs -rmr [-skipTrash] URI [URI …]

Recursive version of delete.

Note: This command is deprecated. Instead use hdfs dfs -rm -r

19. text

Usage: hdfs dfs -text <src>

Takes a source file and outputs the file in text format. The allowed formats are zip and TextRecordInputStream.

20. touchz

Usage: hdfs dfs -touchz URI [URI …]

Create a file of zero length.

Example:

hdfs dfs -touchz pathname

Exit Code: Returns 0 on success and -1 on error.

 

Thanks 🙂

 

Advertisement

SolrCloud Setup on single machine

SolrCloud is the name of a set of new distributed capabilities in Solr. Passing parameters to enable these capabilities will enable you to set up a highly available, fault tolerant cluster of Solr servers. Use SolrCloud when you want high scale, fault tolerant, distributed indexing and search capabilities.

A little about SolrCores and Collections

On a single instance, Solr has something called a SolrCore that is essentially a single index. If you want multiple indexes, you create multiple SolrCores. With SolrCloud, a single index can span multiple Solr instances. This means that a single index can be made up of multiple SolrCore‘s on different machines. We call all of these SolrCores that make up one logical index a collection. A collection is a essentially a single index that spans many SolrCore‘s, both for index scaling as well as redundancy. If you wanted to move your 2 SolrCore Solr setup to SolrCloud, you would have 2 collections, each made up of multiple individual SolrCores.

Steps to install SolrCloud :

  1. Download solr-4.10.0 from http://lucene.apache.org/solr/downloads.html and unzip it.
  2. The process of creating a cluster consisting of two solr servers representing two different shards of a collection :

  1. Since we’ll need two solr servers for this, simply make a copy of the unzip solr folder for the second server — making sure you don’t have any data already indexed.

            In command prompt go to example folder and then –

               cp –r solr4.10.0 solr2 
  1. Go to example folder of first solr1 server in command prompt.
              cd example
  1. Now enter the command starts up a Solr server and bootstraps a new solr cluster.
 java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun  -DnumShards=2 -jar start.jar 

-Dbootstrap_confdir : path to configuration directory. Can be set according to location of the conf   folder.
-Dcollection.configName : name of conf folder on zookeeper.
-DnumShards : number of shards.

  1. Browse to http://localhost:8983/solr/#/~cloud to see the state of the cluster
  2. Then start the second server, pointing it at the cluster 

Go to example folder of solr2 server:

cd example2
java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar
  1. You can see how your collection is deployed across the cluster by visiting the cloud panel in the Solr Admin UI: http://localhost:8983/solr/#/~cloud
  1. To check health of cluster :
    solr healthcheck -c collection name