Installing Apache Spark in local mode on Windows 8

In this post i will walk through the process of downloading and running Apache Spark on Windows 8 X64 in local mode on a single computer.

Prerequisites

Java Development Kit (JDK either 7 or 8) ( I installed it on this path ‘C:\Program Files\Java\jdk1.7.0_67’).
Scala 2.11.7 ( I installed it on this path ‘C:\Program Files (x86)\scala’ . This is optional).
After installation, we need to set the following environment variables:
1. JAVA_HOME , the value is JDK path.
  In my case it will be ‘C:\Program Files\Java\jdk1.7.0_67’. for more details click here.
  Then append it to PATH environment variable as ‘%JAVA_HOME%\bin’ .
2. SCALA_HOME,
  In my case it will be ‘C:\Program Files (x86)\scala’.
  Then append it to PATH environment variable as ‘%SCALA_HOME%\bin’ .

Downloading and installing Spark

It is easy to follow the instructions on http://spark.apache.org/docs/latest/ and download Spark 1.6.0 (Jan 04 2016) with the “Pre-build for Hadoop 2.6 and later” package type from http://spark.apache.org/downloads.html

spark1

2. Extract the zipped file to D:\Spark.

3. Spark has two shells, they are existed in ‘C:\Spark\bin\’ directory :

a. Scala shell (C:\Spark\bin\spark-shell.cmd).
b .Python shell (C:\Spark\bin\pyspark.cmd).

4. You can run of one them, and you will see the following exception:
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.This issue is often caused by a missing winutils.exe file that Spark needs in order to initialize the Hive context, which in turn depends onHadoop, which requires native libraries on Windows to work properly. Unfortunately, this happens even if you are using Spark in local mode without utilizing any of the HDFS features directly.

issue-screen

To resolve this problem, you need to:

a. download the 64-bit winutils.exe (106KB)

Direct download link https://github.com/steveloughran/winutils/raw/master/hadoop-2.6.0/bin/winutils.exe
NOTE: there is a different winutils.exe file for the 32-bit Windows and it will not work on the 64-bit OS

b. copy the downloaded file winutils.exe into a folder like D:\hadoop\bin (or D:\spark\hadoop\bin)

c. set the environment variable HADOOP_HOME to point to the above directory but without \bin. For example:

if you copied the winutils.exe to D:\hadoop\bin, set HADOOP_HOME=D:\hadoop
if you copied the winutils.exe to D:\spark\hadoop\bin, set HADOOP_HOME=D:\spark\hadoop

d. Double-check that the environment variable HADOOP_HOME is set properly by opening the Command Prompt and running echo %HADOOP_HOME%

e. You will also notice that when starting the spark-shell.cmd, Hive will create a C:\tmp\hive folder. If you receive any errors related to permissions of this folder, use the following commands to set that permissions on that folder:

List current permissions: %HADOOP_HOME%\bin\winutils.exe ls \tmp\hive
Set permissions: %HADOOP_HOME%\bin\winutils.exe chmod 777 \tmp\hive
List updated permissions: %HADOOP_HOME%\bin\winutils.exe ls \tmp\hive

5. Re-run spark-shell,it should work as expected.

Text search sample

program-spark

Hope that will help !

Uncategorized

Solutions for a small planet

Installing Apache Spark in local mode on Windows 8

Prerequisites

Downloading and installing Spark

Text search sample

Leave a comment Cancel reply

Prerequisites

Downloading and installing Spark

Text search sample

Share this:

Leave a comment Cancel reply