Installing Apache Spark in local mode on Windows 8

In this post i will walk through the process of downloading and running Apache Spark on Windows 8 X64 in local mode on a single computer.

Prerequisites

  1. Java Development Kit (JDK either 7 or 8) ( I installed it on this path ‘C:\Program Files\Java\jdk1.7.0_67’).
  2. Scala 2.11.7 ( I installed it on this path ‘C:\Program Files (x86)\scala’ . This is optional).
  3. After installation, we need to set the following environment variables:
    1. JAVA_HOME , the value is JDK path.
      In my case it will be ‘C:\Program Files\Java\jdk1.7.0_67’. for more details click here.
      Then append it to PATH environment variable as ‘%JAVA_HOME%\bin’ .
    2. SCALA_HOME,
      In my case it will be  ‘C:\Program Files (x86)\scala’.
      Then append it to PATH environment variable as ‘%SCALA_HOME%\bin’ .

Downloading and installing Spark

  1. It is easy to follow the instructions on http://spark.apache.org/docs/latest/ and download Spark 1.6.0 (Jan 04 2016) with the “Pre-build for Hadoop 2.6 and later” package type from http://spark.apache.org/downloads.html

spark1

2. Extract the zipped file to D:\Spark.

3. Spark has two shells, they are existed in ‘C:\Spark\bin\’ directory :

       a. Scala shell (C:\Spark\bin\spark-shell.cmd).
b .Python shell (C:\Spark\bin\pyspark.cmd).

4. You can run of one them, and you will see the following exception:
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.This issue is often caused by a missing winutils.exe file that Spark needs in order to initialize the Hive context, which in turn depends onHadoop, which requires native libraries on Windows to work properly. Unfortunately, this happens even if you are using Spark in local mode without utilizing any of the HDFS features directly.

issue-screen

To resolve this problem, you need to:

a. download the 64-bit winutils.exe (106KB)

b. copy the downloaded file winutils.exe into a folder like D:\hadoop\bin (or                     D:\spark\hadoop\bin)

c. set the environment variable HADOOP_HOME to point to the above directory but without \bin. For example:

  • if you copied the winutils.exe to D:\hadoop\bin, set HADOOP_HOME=D:\hadoop
  • if you copied the winutils.exe to D:\spark\hadoop\bin, set HADOOP_HOME=D:\spark\hadoop

d. Double-check that the environment variable HADOOP_HOME is set properly by         opening the Command Prompt and running echo %HADOOP_HOME%

e. You will also notice that when starting the spark-shell.cmd, Hive will create a C:\tmp\hive folder. If you receive any errors related to permissions of this folder, use the following commands to set that permissions on that folder:

  • List current permissions: %HADOOP_HOME%\bin\winutils.exe ls \tmp\hive
  • Set permissions: %HADOOP_HOME%\bin\winutils.exe chmod 777 \tmp\hive
  • List updated permissions: %HADOOP_HOME%\bin\winutils.exe ls \tmp\hive

5. Re-run spark-shell,it should work as expected.

Text search sample

program-spark

Hope that will help !

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s