In this post i will walk through the process of downloading and running Apache Spark on Windows 8 X64 in local mode on a single computer.
Prerequisites
- Java Development Kit (JDK either 7 or 8) ( I installed it on this path ‘C:\Program Files\Java\jdk1.7.0_67’).
- Scala 2.11.7 ( I installed it on this path ‘C:\Program Files (x86)\scala’ . This is optional).
- After installation, we need to set the following environment variables:
- JAVA_HOME , the value is JDK path.
In my case it will be ‘C:\Program Files\Java\jdk1.7.0_67’. for more details click here.
Then append it to PATH environment variable as ‘%JAVA_HOME%\bin’ . - SCALA_HOME,
In my case it will be ‘C:\Program Files (x86)\scala’.
Then append it to PATH environment variable as ‘%SCALA_HOME%\bin’ .
- JAVA_HOME , the value is JDK path.
Downloading and installing Spark
- It is easy to follow the instructions on http://spark.apache.org/docs/latest/ and download Spark 1.6.0 (Jan 04 2016) with the “Pre-build for Hadoop 2.6 and later” package type from http://spark.apache.org/downloads.html
2. Extract the zipped file to D:\Spark.
3. Spark has two shells, they are existed in ‘C:\Spark\bin\’ directory :
a. Scala shell (C:\Spark\bin\spark-shell.cmd).
b .Python shell (C:\Spark\bin\pyspark.cmd).
4. You can run of one them, and you will see the following exception:
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.This issue is often caused by a missing winutils.exe file that Spark needs in order to initialize the Hive context, which in turn depends onHadoop, which requires native libraries on Windows to work properly. Unfortunately, this happens even if you are using Spark in local mode without utilizing any of the HDFS features directly.
To resolve this problem, you need to:
a. download the 64-bit winutils.exe (106KB)
- Direct download link https://github.com/steveloughran/winutils/raw/master/hadoop-2.6.0/bin/winutils.exe
- NOTE: there is a different winutils.exe file for the 32-bit Windows and it will not work on the 64-bit OS
b. copy the downloaded file winutils.exe into a folder like D:\hadoop\bin (or D:\spark\hadoop\bin)
c. set the environment variable HADOOP_HOME to point to the above directory but without \bin. For example:
- if you copied the winutils.exe to D:\hadoop\bin, set HADOOP_HOME=D:\hadoop
- if you copied the winutils.exe to D:\spark\hadoop\bin, set HADOOP_HOME=D:\spark\hadoop
d. Double-check that the environment variable HADOOP_HOME is set properly by opening the Command Prompt and running echo %HADOOP_HOME%
e. You will also notice that when starting the spark-shell.cmd, Hive will create a C:\tmp\hive folder. If you receive any errors related to permissions of this folder, use the following commands to set that permissions on that folder:
- List current permissions: %HADOOP_HOME%\bin\winutils.exe ls \tmp\hive
- Set permissions: %HADOOP_HOME%\bin\winutils.exe chmod 777 \tmp\hive
- List updated permissions: %HADOOP_HOME%\bin\winutils.exe ls \tmp\hive
5. Re-run spark-shell,it should work as expected.
Text search sample
Hope that will help !